INTRODUCTORY ECONOMETRICS
Background
You are interested in estimating the effect of education on earnings. The data file cps4 small.dta
contains 1,000 observations on hourly wage rates, education, and other variables from the 2008 Current
Population Survey (CPS):
wage: earnings per hour
educ: years of education
exper: post education years experience
hrswk: working hours per week
married: dummy for married
female: dummy for female
metro, midwest, south, west: location dummies
black: dummy for black
asian: dummy for Asian
Submission of your report
Your report must be single-spaced and in 12 Font size. You should give your answer to each of the
following questions following a similar format of the solutions to the tutorial problem sets. When you
are required to use R, you must show your R command and R outputs (screenshots or figures generated
from R). You will lose 2 points whenever you fail to provide R commands and outputs. For each
question, when you are asked to discuss or interpret, your answer has to be brief and compact. You
will lose 2 points if your answer is needlessly wordy. In addition, you may lose marks for any of the
following: failing to grasp or address core concepts and ideas; poor or ineffective structure; unclear or
illogical flow of ideas; fluffy or unclear arguments; and weak or badly composed arguments. You must
upload your assignment on the course webpage (Blackboard) in PDF format. (Do not submit a hard
copy.)
Research tasks
1. (20 points) Load and explore the main variables.
(a) (7 points) You are given a dataset in the .dta format. Figure out how to load this dataset
in R. Provide your R-commands to load the data. In particular, be clear about which
R-package you install and use. (Hint: use the Internet.)
(b) (13 points) Obtain summary statistics and histograms for the variables wage and educ. For
the histograms, give informative titles and variable names instead of just using the default
titles and variable names. For example, you could use Years of Education in place of
educ. Discuss the data characteristics.
1
2. (25 points) Estimate the linear regression
ln(wagei) = β1 + β2educi + ei.
where ei is the error and β1 and β2 are the unknown population coefficients.
(a) (5 points) Report the estimation results in a common form as introduced in the lecture note
3. For example, see page 9 of the note 3, where the estimates are presented in an equation
form, along with standard errors and some measures for model fit.
(b) (5 points) Construct a scatter diagram of educ and ln(wage) and plot the estimated re-
gression equation in (a) on the scatter diagram. Give informative title and labels for the
variables, e.g., do not use the default title and labels.
(c) (4 points) Assuming that E[e|educ] = 0, interpret the estimated coefficient on educ (2
points) and test whether or not the population coefficient is zero at the 1 % significance level
(2 points).
(d) (6 point) You suspect that the hourly wage could depend on working hours per week.
Discuss under what condition(s) the estimated coefficients in (a) would be biased due to the
omission of the weekly working hours (2 points). Give a reasonable and intuitive story on
why omission of the weekly working hours would cause omitted variable bias in the regression
in (a) (2 points). Under your story, explain whether the estimated coefficient on educ in (a)
would be overestimated or underestimated (2 points). See pages 4 and 5 of Lecture note 4.
(e) (5 point) The variable hrswk is the average weekly working hours for each individual in the
data. Regress ln(wage) on educ and hrswk. Discuss the estimation results. In particular,
how would you revise your answer in (c)? Are the estimates are statistically significant?
2
3. (40 points) You are concerned about omitted variable bias in the regressions of Question 1. For
that reason, you decide to regress ln(wage) on all other variables in the dataset and use this model
as a benchmark.
(a) (11 points) Report a 95% confidence interval for the estimated slope parameter of educ
(3 points), explain the relationship between confidence intervals and hypothesis testing (4
points), and test the hypothesis that one year of additional education would increase hourly
wage by 12% (4 points).
(b) (7 points) Assuming there is no omitted variable bias, discuss the estimated coefficient on
female in the benchmark model. In particular, explain what the estimated coefficient on
female means on hourly wage (3 points), compare the effect being female has on hourly
wage, with the effect that one additional year of education has on hourly wage (2 points),
and discuss whether the effect of being female on hourly wage is significantly different from
zero (2 points).
(c) (5 points) Using the estimation results of the benchmark model, test the hypothesis that
the hourly wage is not affected by the geographic location. Explain how you reach your
conclusion. (Hint: use package car.)
(d) (5 points) Using the estimation results of the benchmark model, test the hypothesis that the
wage differential associated with African American is equal to the wage differential associated
with Asian American. Explain how you reach your conclusion. (Hint: use package car.)
(e) (7 points) How would you modify the benchmark model to estimate the effects on hourly
wage of one additional year of education separately for each gender (4 points). How do the
effects of education differ between the genders and is the difference statistically significant?
(3 points)
(f) (5 point) Keoka is an African American woman, working in a metropolitan area. After she
obtained her high school diploma, she got a job and started working instead of getting a
higher education. She has never been married. Now she has five years of experience in the
industry and is working full time (40 hours a week). Using the benchmark model, predict
her hourly wage.
Be careful: the left-hand side variable is ln(wage), but you should predict Keoka’s wage.
4. (15 points) It may be more useful to estimate the effect on earnings of education by using the
highest diploma/degree rather than years of schooling. Define four dummy variables to indicate
educational achievements;
lt hs = 1 if educ < 12
hs = 1 if educ = 12
col = 1 if educ ≥ 16
some col = 1 for all other values of educ.
(a) (6 points) Create the dummy variables (lt hs, hs, col, some col) as defined above (3
points) and compute the sample means of hourly wage for each of the four education cate-
gories (3 points).
(b) (9 points) Regress wage on the four dummies (lt hs, hs, col, some col). You will face a
problem. What is the problem here? Under what circumstances would you face the problem
(4 points). To avoid it, you now regress wage on three dummies (lt hs, col, some col)
excluding hs. Interpret the estimated coefficients and compare the estimation results with
the findings in (a) (5 points).
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。