联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-12-04 09:57

Fall 2019

PPHA 31002

Homework 8

Due: Wednesday, December 4th, 2019 at 11:59pm

Your write-up should include figures you generate in part 1, answers to the questions in both parts, any code

used to generate answer the questions, and the full and detailed work to your answers in part 2.

Randomized Control Trial: De-Worming in Rural Kenya

We will be “replicating” a key null result in Miguel, Edward, and Kremer, Michael. 2004. “Worms: Identifying

Impacts on Education and Health in the Presence of Treatment Externalities.” Econometrica 72 (1): 159–217.1

While they found that the treatment increased attendance, they failed to find an effect on test scores.

This is a very famous paper in development economics for several reasons. It was an important demonstration of

how to run a Randomized Control Trial (RCT), especially in the presence of spillovers from treated to non-treated

individuals. It also played an important part in a debate regarding external validity of the results of deworming

interventions, and of RCTs more broadly. We will focus on just the key components of the paper.

Here is a short summary (from the replication files they prepared): “Hookworm, roundworm, whipworm, and

schistosomiasis infect more than one in four people worldwide and are particularly prevalent among school-age

children in developing countries. The former three worms are transmitted through ingestion of or contact with

infected fecal matter, while schistosomiasis parasites are carried by snails in water where children swim or bathe.

Because these worms do not reproduce in their human hosts, most infected individuals have minor cases with

few if any symptoms. However, severe helminth infestations, resulting from repeated infection, can cause iron

deficiency anemia, protein energy malnutrition, stunting, wasting, listlessness, and abdominal pain. In addition to

the negative health and nutritional consequences, worm infections often result in impaired cognitive ability, poor

academic performance, reduced school attendance, and high drop out rates.”2

1. Read the first two pages of the paper (the first one and a half should be enough as well) and summarize in

five to seven sentences what the problem the paper is discussing, what is the research question, how what

intervention are they running as an experiment, and what are their main findings. (2 points)

2. Load the data file deworming kenya data.csv into R.

3. You have the following variables in the data:3

• pupil id: a numeric identifier for each student.

• pupil birth year: the year the student was born in.

• school id: a numeric identifier for the school of the student.

• treatment group: a categorical variable denoting when the students in the school received treatment:

– =1: Began receiving treatment in 1998.

– =2: Began receiving treatment in 1999.

– =3: Began receiving treatment in 2001.

• ics98, ics99: ICS Exam Score (normalized), for 1998 and 1999, respectively.

4. For each treatment group (there are three), and for each test score (there are two), plot a histogram of the

test score. Also include a table with the mean and sd of the test score. Ideally, try to have three rows (one

for each treatment group) and four columns (mean and standard deviation, for each test score). Note that to

calculate the these, you will also have to drop NAs. (3 points)

5. Papers that use an RCT often produce what is called a randomization table. The idea is to compare different

“observables” (variables we can observe) and see if there are systematic differences in those variables between

the treatment groups. If the randomization process worked, then there shouldn’t be any big differences across

1We write “replicating” because we will be performing a simplified version of their analysis. Still, it will be very close to the essence

of what they do in the paper.

2You can access the full data, and user guides on:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28038

3This is a version of the data we cleaned and simplified. The real raw data have a much more complex structure to them.

1

observables. Which supports the assumption (but does not prove it!) that there are no systematic differences

between unobservables as well. In our case, we are going to only look at one such observable characteristic

(birth year). You will often see papers use ten or more variables to try and convince you that randomization

worked. Before we “test for randomization,” there is (at least) one value in pupil birth year which does not

make sense and should get dropped. Which is it (are they)?4

(1 point)

6. What are the mean and sd for birth year in each treatment group? Notice this is asking to calculate the mean

and sd for the variable pupil birth year separately for each group, but is not asking anything about the test

scores yet. (1 point)

7. Run a t-test comparing the mean birth year in the first group to the second group. Repeat this comparison

for the first to third group, and for the second to third group. Can you reject the null that the birth year

is the same across any of these groups? Throughout the exercise, use a significance level of 0.05 (2

points)

8. We will now compare each treatment group, separately, using the two test scores we have. Given that the

tests were conducted in 1998 and 1999, and de-worming treatment was assigned to each group using the

definition provided above. Where would you expect to see an impact on tests scores? Why? Discuss briefly

(five sentences). (2 points)

9. Use paired t-tests to compare the two test scores separately for each group. Meaning, you should run one

paired t-test for each treatment group. Can you reject the null of no difference in any of those cases? If so,

how do you interpret the results of your test? Explain whether you chose to run a two-tailed or a one-tailed

test (or both). Either can be valid under a reasonable justification. (4 points)

10. Run a two-tailed t-test for two samples, using the test score for 1998, comparing group 1 to group 2. Calculate

the standard deviation according to the equal variances method of the two-sample t-test. How do you interpret

your result? (2 points)

11. Run a two-tailed t-test for two samples, using the test score for 1999, comparing group 2 to group 3. Calculate

the standard deviation according to the equal variances method of the two-sample t-test. How do you interpret

your result? (2 points)

12. What do you conclude from the results of the tests you’ve run regarding the effect that deworming treatment

had on test scores in Kenya? (1 point)

13. What can you say about the general efficacy of deworming treatments and their effect on tests scores? Will

they always be effective? Will they never be effective? (2 points)

The Bootstrap

The data nsw.csv are data from the National Supported Works experiment.5 This data have three variables:

whether the individual got treated (treated), whether the individual worked after treatment in 1978 (work78), and

the earnings of the individual in 1978 (earn78). We wish to discover whether treatment increased the earnings

of participants and increased the likelihood they worked. The treatment variable (treated) is equal to one if the

participant was assigned to the treatment group and is equal to zero if assigned to the control group.

1. Load the nsw.csv data in R.

2. Plot one histogram of work78 for observations with treated=1, and one histogram for treated=0. Repeat this

for earn78 (2 points).

3. Test the null hypothesis, with α = 0.05 (two-tailed), that the mean earnings of the treatment group and

control group are equal. Use both a t-test and z-test. Can you reject H0 with the t test? Can you reject

H0 with z test? For the t-test, calculate the standard deviation according to the equal variances method of

the two-sample t-test. Contrast the significance with a “z-test,” treating the standard deviation as given and

evaluating the significance with pnorm()6

(5 points)

4Hint: Offered as a gentle reminder to always, always, always, check and plot your data.

5More details about the program can be read in https://www.ncjrs.gov/pdffiles1/Digitization/59202NCJRS.pdf

6

In the z-test, we need to assume that the standard deviation we estimated is the known population standard deviation.

2

4. Test the null hypothesis that earnings for the treatment group are the same as earnings for the control group

using the bootstrap t percentile method, or the percentile method that we covered in class and TA

sessions. The alternative hypothesis is that they are not equal. (For the percentile t test, use the unequal

variance formulation.) Include those individuals with zero earnings in your test. Set the number of bootstrap

replications to 10,000. (6 points)

5. What feature of the data leads to the p-values across the three different tests differ so much (focus on the

difference between the t and z tests to the bootstrap test)? (2 points)

Take a moment to think and reflect about how many of the things you completed during this assignment were

known or even understandable just ten weeks ago. You have covered a lot of material. Congratulations!

3


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp