联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2022-06-10 10:46

A Power Calculation

Suppose, for illustation, that we are interested in testing the hypothesis

H0: 1 = ?2 vs. HA: 1 6= 2

Suppose, also for illustration, that the test statistic associated with this test has the form

It will be useful to define the notion of a rejection region R: all values of the observed test statistic

t that would lead to the rejection of H0:

R = {t | H0 is rejected}

– If t 2 R, we reject H0

– If t 2 Rc, we do not reject H0

Defining Type I and Type II error rates in terms of a rejection region is also useful:

= Pr(Type I Error) = Pr(Reject H0 | H0 is true)

= Pr(T 2 R | H0 is true)

= Pr(Type II Error) = Pr(Do Not Reject H0 | H0 is false)

= Pr(T 2 Rc | H0 is false)

2

3

Permutation and Randomization Tests

All of the previous tests have made some kind of distributional assumption for the response measure-

ments

It would be preferable to have a test that does not rely on any assumptions

This is precisely the purpose of permutation and randomization tests.

– These tests are nonparametric and rely on resampling.

– The motivation is that if H0 : ?1 = ?2 is true, any random rearrangement of the data is equally

likely to have been observed.

– With n1 and n2 units in each condition, there are?

arrangements of the n1 + n2 observations into two groups of size n1 and n2 respectively

4

A true permutation test considers all possible rearrangements of the original data

– The test statistic t is calculated on the original data and on every one of its rearrangements

– This collection of test statistic values generate the empirical null distribution

A randomization test is carried out similarly, except that we do not consider all possible rearrange-

ments

– We just consider a large number N of them

Randomization Test Algorithm

1. Collect response observations in each condition.

2. Calculate the test statistic t on the original data.

5

3. Pool all of the observations together and randomly sample (without replacement) n1 observations which

will be assigned to “Condition 1” and the remaining n2 observations are assigned to “Condition 2”.

Repeat this N times.

4. Calculate the test statistic t?k on each of the “shu?ed” datasets, k = 1, 2, . . . , N .

5. Compare t to {t?1, t?2, . . . , t?N}, the empirical null distribution and calculate the p-value:

p-value =

# of t?’s that are at least as extreme as t

N

Example: Pokemon Go

Suppose that Niantic is experimenting with two di?erent promotions within Poke′mon Go:

– Condition 1: Give users nothing

– Condition 2: Give users 200 free Poke′coins

– Condition 3: Give users a 50% discount on Shop purchases

In a small pilot experiment n1 = n2 = n3 = 100 users are randomized to each condition

For each user, the amount of real money (in USD) they spend in the 30 days following the experiment

is recorded

The data summaries are:

– y1 = $10.74, Q1(0.5) = $9

– y2 = $9.53, Q2(0.5) = $8

– y3 = $13.41, Q3(0.5) = $10

6

3 Experiments with More than Two Conditions

3.1 Anatomy of an A/B/m Test

We now consider the design and analysis of an experiment consisting of more than two experimental

conditions – or what many data scientists broadly refer to as “A/B/m Testing”.

– Canonical A/B/m test:

Figure 1: Button-Colour Experiment

Other, more tangible, examples:

– Netflix

– Etsy

Typically the goal of such an experiment is to decide which condition is optimal with respect to some

metric of interest. This could be a

– mean

– proportion

– variance

– quantile

– technically any statistic that can be calculated from sample data

From a design standpoint, such an experiment is very similar to a two-condition experiment

1. Choose a metric of interest ? which addresses the question you are trying to answer

2. Determine the response variable y that must be measured on each unit in order to estimate b?

3. Choose the design factor x and the m levels you will experiment with.

4. Choose n1, n2, . . . , nm and assign units to conditions at random

5. Collect the data and estimate the metric of interest in each condition:

b1, b2, . . . , bm

7

Determining which condition is optimal typically involves a series of pairwise comparisons

But it is useful to begin such an investigation with a gatekeeper test which serves to determine whether

there is any di?erence between the m experimental conditions. Formally, such a question is phrased

as the following statistical hypothesis.

H0: 1 = 2 = · · · = m versus HA: j 6= k for some j 6= k (1)

3.2 Comparing Multiple Means with an F -test

We assume that our response variable follows a normal distribution and we assume that the mean of

the distribution depends on the condition in which the measurements were taken, and that the variance

is the same across all conditions.

The “gatekeeper” test for means is tested using an F -test

In particular, we use the F -test for overall significance in an appropriately defined linear regression

model :

– The appropriately defined linear regression model in this situation is one in which the response

variable depends on m 1 indicator variables:

xij =

(

1 if unit i is in condition j

0 otherwise

for j = 1, 2, . . . ,m 1.

– For a particular unit i, we adopt the model

Yi = 0 + 1xi1 + 2xi2 + · · ·+ m1xi,m1 + "i

8

– In this model the ’s are unknown parameters and may be interpreted in the context of the

following expectations:

E[Yi|xi1 = xi2 = · · · = xi,m1 = 0] = 0

E[Yi|xij = 1] = 0 + j

– Based on these assumptions, H0 in (1) is true if and only if 1 = 2 = · · · = m1 = 0. Thus

testing (1) is equivalent to testing

H0: 1 = 2 = · · · = m1 = 0 vs. HA: j 6= 0 for some j

– This hypothesis corresponds, as noted, to the F -test for overall significance in the model.

In regression parlance, the test statistic is defined to be the ratio of the regression mean squares (MSR)

to the mean squared error (MSE) in a standard regression-based analysis of variance (ANOVA):

t =

MSR

MSE

In our setting we can more intuitively think of the test statistic as comparing the response variability

between conditions to the response variability within conditions:

9

The null distribution for this test is F(m1,Nm)

The p-value for this test is calculated by

p-value = P (T t)

where T F(m1,Nm)

Example: Candy Crush Boosters

– Candy Crush is experimenting with three di?erent versions of in-game “boosters”: the lollipop

hammer, the jelly fish, and the color bomb.

Figure 2: Candy Crush Experiment

– Users are randomized to one of these three conditions (n1 = 121, n2 = 135, n3 = 117) and they

receive (for free) 5 boosters corresponding to their condition. Interest lies in evaluating the e?ect

of these di?erent boosters on the length of time a user plays the game.

– Let μj represent the average length of game play (in minutes) associated with booster condition

j = 1, 2, 3. While interest lies in finding the condition associated with the longest average length

of game play, here we first rule out the possibility that booster type does not influence the length

of game play (i.e., μ1 = μ2 = μ3).

– In order to do this we fit the linear regression model

Y = 0 + 1x1 + 2x2 + "

where x1 and x2 are indicator variables indicating whether a particular value of the response was

observed in the jelly fish or color bomb conditions, respectively. The lollipop hammer is therefore

the reference condition.

10

Optional Exercises:

Calculations: 2, 7

Proofs: 1, 5, 6, 9, 10, 14, 17, 18

R Analysis: 2, 5, 6, 8, 13(g), 17 (not g,h), 22(h), 23(a-f)


相关文章

【上一篇】:到头了
【下一篇】:没有了

版权所有:编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。