联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-04-14 10:49

ECO220Y5Y: Quantitative Methods in Economics

Final Assignment

Replacement for Exam Assessment on Regression

1 Interactive Regression Exercise

1.1 Motivation

Econometrics is best understood by doing rather than by reading about what someone else has

done. There are difficult choices and many pitfalls in arriving at the ‘correct’ model. Sometimes

the existing theory underlying the relationships in your model seem a bit off and you could build a

much ‘better’ model by including a different set of variables or transforming them (this gets at the

internal validity of the model). Sometimes choosing a model with the best fit means you are making

decisions that are ideal only for your sample and would not apply well to data outside your sample

time period or group of individuals (this gets at the external validity of the model). You need to

trade off the internal validity with the external validity as a researcher. As a result, econometrics

can sometimes feel more like an art than a science. However, you will be asked to follow a scientific

approach to making these model decisions and justifying these decisions in a scientific way. This

assignment requires that you make independent choices on specification, analyse the consequences

of these choices and adjust your choices to narrow in on a final model. You will be asked to justify

the model you have selected and then provide some feedback on its economic implications.

1.2 Overview & Data

The dependent variable for this interactive assignment is the Provincial Achievement Test (PAT)

score earned by students in an Alberta high school. There are 70 observations for this data set

measuring PAT scores and a number of possible causal factors have been randomly drawn out of a

pool of approximately 750 students over approximately one decade. The literature on PAT scores

indicates that scores are determined not only by ability and training but also various socio-economic

factors. Please see the attached article by James Fallows, ‘The Tests and the Brightest: How Fair Are

the College Boards.’ for a summary of views in the literature on how SAT performance in the USA

might be impacted by various socio-economic factors (PAT scores and SAT scores should be similarly

determined). Measures of ability and training included here are the cumulative high school grade

point average (GPA) and participation in advanced placement math and English courses (APMATH

and APENG). Advanced placement courses may help students perform better on the PAT. This

data set also includes a number of dummy variables measuring qualitative socio-economic factors

such as a student’s gender (MALE), ethnicity (WHITE), and native language (ENG). The data set

also includes a dummy variable indicating whether or not a student has attended a PAT preparation

class (PREP). The data set includes a variable indicating what year (YEAR) the students PAT score

and other information was recorded. Finally there are several variables created as the product of two

other variables.

Here is a detailed description of all variables in this assignment:

• P ATi = the Provincial Achievement Test score of the i

th student on a scale from 0 to 100

• GP Ai = the grade point average of the i

th student on a scale from 0 to 5

• APMAT Hi = a dummy variable equal to 1 if the i

th student has taken AP Math, 0 otherwise

• AP ENGi = a dummy variable equal to 1 if the i

th student has taken AP English, 0 otherwise

• APi = a dummy variable equal to 1 if the i

th student has taken either AP Math and/or AP

English, 0 otherwise

• MALEi = a dummy variable equal to 1 if the i

th student is Male, 0 if Female

• W HIT Ei = a dummy variable equal to 1 if the i

th student is Caucasian, 0 otherwise

• ENGi = a dummy variable equal to 1 if the i

th student’s first language is English, 0 otherwise

• P REPi = a dummy variable equal to 1 if the i

th student has attended a PAT preparation

course, 0 otherwise

• Y EARi = the year the Provincial Achievement Test was taken for the i

th student recorded

from 2007 to 2018

• GP AMALEi = (GP Ai)(MALEi)

• GP AW HIT Ei = (GP Ai)(W HIT Ei)

• GP AENGi = (GP Ai)(ENGi)

• W HIT EMALEi = (W HIT Ei)(MALEi)

1.3 Summary Statistics

Included below are the Means, Standard Deviations, and Correlation Coefficients for the variables

in this assignment

Means and Standard Deviations:

Correlation Coefficients:

2 Section A: Building a Model of PAT Scores

2.1 Choosing the best specification

In this section you will choose the specification you’d like to estimate from the list below, find the

regression number of that specification and then look at the regression results for your chosen specification

in the appendix at the end. You can base your initial decision on the literature provided

regarding potential discrimination in standardised testing design and also the summary statistics and

correlation coefficients for the variables. You should then decide if you are satisfied with your model

selection based on the results. If you are not satisfied you can use the information from the regression

you ran to decide how to adjust the specification. You can now repeat the process until you decide

on a final selection of the ‘best’ specification. Once you decide on your preferred specification you

will answer the questions found below the regression model options.

Regression Models:

1. Model 1: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + i

2. Model 2: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4ENGi + i

3. Model 3: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4MALEi + i

4. Model 4: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4P REPi + i

5. Model 5: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4W HIT Ei + i

6. Model 6: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4ENGi + β5MALEi + i

7. Model 7: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4ENGi + β5P REPi + i

8. Model 8: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4ENGi + β5W HIT Ei + i

9. Model 9: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4MALEi + β5P REPi + i

10. Model 10: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4MALEi + β5W HIT Ei + i

11. Model 11: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4P REPi + β5W HIT Ei + i

12. Model 12: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4ENGi + β5MALEi + β6P REPi + i

13. Model 13: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4ENGi + β5MALEi + β6W HIT Ei + i

14. Model 14: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4ENGi + β5P REPi + β6W HIT Ei + i

15. Model 15: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4MALEi + β5P REPi + β6W HIT Ei + i

16. Model 16: P ATi = β0 + β1GP Ai + β2APMAT Hi + β3AP ENGi + β4ENGi + β5MALEi + β6P REPi

+ β7W HIT Ei + i

17. Model 17: P ATi = β0 + β1GP Ai + β2APi + β3P REPi + i

18. Model 18: P ATi = β0 + β1GP Ai + β2APi + β3P REPi + β4W HIT Ei + i

19. Model 19: P ATi = β0 + β1GP Ai + β2APi + β3ENGi + β4P REPi + β5W HIT Ei + i

20. Model 20: P ATi = β0 + β1GP Ai + β2APi + i

Section A Questions:

1. Write out the estimated model for your preferred specification including coefficients and standard

errors.

2. Evaluate your estimation results with respect to its economic meaning, overall model fit, and the signs

and significance of the individual coefficients.

3. What specification problems (omitted variables, irrelevant variables, multicollinearity) might your

regression have? Why?

4. Do you have any possible suggestions to improve the model that you were not able to choose based

on the models provided?

3 Section B: Correcting a Model of PAT Scores

3.1 Understanding and correcting issues

In this section you will assess the model you selected in the last section for heteroskedasticity and

serial correlation and determine the desired approach to interpret and correct for these issues. Based

on your chosen model in Section A with its residual plot given in the last section appendix as well

as the scatter plots in the Section B appendix answer the following questions below. Provide a few

sentences to justify your answers.

Section B Questions:

1. Do you believe there might be a problem of heteroskedasticity in your chosen model? Do you believe

it is pure or impure?

2. Do you believe there might be a problem of serial correlation in your chosen model? Do you believe it

is pure or impure?

3. Based on the answers you gave to the two questions above, what would you suggest you do to improve

the estimated model and why?

4 Section C: Interpreting a Model of PAT Scores

4.1 Deciding what you can learn from the model

In this section you will assume that a professional econometrician ran 2 models (model A & B) and

determined the best specification is model B based on underlying theory. It is not your job in this

case to question the model but rather to interpret the results. Based on the regression results for

model B answer all of the following questions below by providing your rough work in calculations

and at least a few sentences to support your argument. Note that LNPAT is the natural log of PAT

scores. Both models are given in the appendix under Section C.

Section C Questions:

1. Calculate the 98% two-sided confidence interval for the coefficient on MALE. Interpret this coefficient

and what the confidence interval you calculated implies for your interpretation.

2. Test whether the absolute value of the coefficient on GPAWHITE is greater than the absolute value

of the coefficient on GPAENG. Explain the meaning of this test result in terms of PAT scores.

3. Draw and indicate the slope and intercept of the estimated models (lines of best fit) relating GPA to

the natural log of PAT scores for white males vs. non-white females conditional on them having taken

Advanced Placement classes and speaking English as their first language. Interpret the two estimated

lines in words.

4. Solve for the impact on PAT scores of a student having a GPA of 2 rather than a GPA of 0, given

they did not take AP courses, are non-white, male and do not speak English as a first language. Show

all your work in this calculation.

5. Based on inference using the results from Model B but also taking into account both models, do you

believe there is potential evidence of discrimination/bias in the way PAT’s are designed or administered?

5 Section D: Working on a Model of PAT Scores in Stata

5.1 Show you can generate your own results using code

In this section you will indicate the code you would plan to use in Stata to achieve some basic tasks.

This will draw on the sort of knowledge contained in labs, lectures, the data project and the help

session you have received with Stata code that you can refer back to. For each question below you

should provide some basic Stata code that could be run and would achieve the results requested.

There is often multiple correct ways to approach the coding, some more efficient than others, but

the only consideration will be if the actual desired outcome is achieved. Note, you do not need to

actually run the code on a data set just indicate what you believe to be a correct approach but you

can assume you already have the variables indicated in this assignment loaded and ready in your

Stata program.

Section D Questions:

1. Transform the GPA variable into a new variable measuring the natural log of GPA called LNGPA

2. Run a regression of LNPAT on LNGPA

3. Scatter LNPAT against LNGPA and display the line of best fit (linear regression line) for the model

you just estimated

4. Calculate the residuals and create a new variable for them called RES

5. Calculate the fitted values and create a new variable for them called YHAT

6. Scatter the residuals (RES) against the fitted values (YHAT) to check for any issues

7. Run a new regression of LNPAT on LNGPA , AP, MALE, ENG

8. At the 1% level of sig, test whether the true coefficient on MALE could be equal to ENG

9. Test for specification error in the regression you ran

10. Test for heteroskedasticity in the regression you ran

6 Appendix:

6.1 Section A Estimated Models

Regression Model 1:

6.3 Section C Additional Model Estimations

Regression Model A:

Regression Model B:


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp