Department of Economics – University of Victoria
Economics 345 “Applied Econometrics” (Summer 2019)
Assignment 4
Due on Monday, July 29
th, 4 pm in Department Dropbox
You are encouraged to work in groups on this assignment, but every student must submit their own
version of the assignment with their own write-up. Please indicate who was in your group by writing
the names and V-numbers of all group members at the top of your submitted assignment. Please
ensure you write legibly – an illegible assignment may lose marks.
Note that copying from existing solution manuals or solutions found online constitutes plagiarism
under the University’s guidelines.
Dummy Variables
1. (5 marks) The following equations were estimated using the data in “bwght”:
bwght i
=4.682 - 0.005cigsi
+0.155parityi
+0.026malei+0.062whitei
(0.016) (0.0008) (0.006) (0.010) (0.012)
n=1,388, R2=0.045, R=0.043
and
bwght i=4.668 - 0.005cigsi+0.171parityi+0.033malei+0.049whitei-0.002motheduci+0.004fatheduci
(0.036) (0.001) (0.006) (0.011) (0.015) (0.003) (0.003)n=1,191, R2=0.048, R2=0.043
The variables’ definition can be found in the link:
http://fmwww.bc.edu/ec-p/data/wooldridge/bwght.des.
i (2 marks) Considering the second regression results, comment on the estimated effect and
statistical significance of fatheduc. (i.e., comment on the practical and statistical significance
of fatheduc’s partial effect separately)
ii (3 marks) Recall the content we learned in Chapter 4 (4-5f Testing General Linear
Restrictions). If two linear models are nested, it means that the restricted model is obtained
from the full model by setting some constraints on the parameters). In this question, the first
regression is the restricted model. It is obtained by restricting βmotheduc =0 and βfatheduc=0 in
the full model. Usually we can compare nested models with the F test to see whether the
restrictions are valid. In this questions, it means we need to compute the F statistic for joint
significance of motheduc and fatheduc by using the formula 4.37 in the textbook.
However, from the given information, why are you unable to compute the F statistic? What
would you have to do to compute the F statistic?
2. (11 marks) Use the data in “fertil2” to answer this question. The variables’ definition can be found
in the link: http://fmwww.bc.edu/ec-p/data/wooldridge/fertil2.des. Lab 9 provides the examples of
code that will be used in this question.
i (3 marks) Estimate the following model
2
and report the results with both usual and heteroskedasticity-robust standard errors. Are the
robust standard errors always bigger than the nonrobust ones?
ii (4 marks) Add the two religious dummy variables (protest, catholic). Suppose
heteroskedasticity is present in the equations from part (i) and (ii), can we use F-test here to
test whether the coefficients of protest and catholic are jointly significant? Why or why not?
If we cannot use F-test, what test should we apply? What is the p-value obtained from the
joint test for the coefficients of protest and catholic?
iii (3 marks) Now we move back to the regression in part (i). Choose one of the methods we
learned in lab 10 to test heteroskedasticity. Explain the method you choose. Make a
conclusion about whether heteroskedasticity is present in the equation for children.
iv (1 mark) If you find heteroskedasticity in part (iii), would you say the heteroskedasticity is
practically important?
3. (9 marks) Use the data in “loanapp” for this exercise. The binary variable to be explained
is approve, which is equal to one if a mortgage loan to an individual was approved. The key
explanatory variable is black, a dummy variable equal to one if the applicant was black. The other
applicants in the data set are white and Hispanic. The variables’ definition can be found in the link:
http://fmwww.bc.edu/ec-p/data/wooldridge/loanapp.des.
To test for discrimination in the mortgage loan market, a linear probability model can be used:
approvei
=β0
+β1
blacki+ control variables+ui
i (1 mark) If there is discrimination against blacks, and the appropriate factors have been
controlled for, what is the sign of β1?
ii (3 marks) As controls, add the variables hrat, obrat, loanprc, unem, male, married, dep, sch,
cosign, chist, pubrec, mortlat1, mortlat2, vr, and black*obrat. Attach the estimated regression
results from R (no need to write down the regression equation). Is there evidence of
discrimination against blacks?
iii (2 marks) Using the model from part (ii), what is the effect of being black on the probability
of approval when obrat=32, which is roughly the mean value in the sample? Obtain a 95%
confidence interval for this effect. (Hint: replace black*obrat with black*(obrat-32). Run the
regression to obtain the estimated coefficient and the corresponding standard error of the new
interaction term to construct the confidence interval.)
iv (3 marks) Show the histogram of the predicted values of approve from the model in part (iii).
Are there any predicted values outside of the [0,1] range? Why should we be concerned about
this? (Hint: use hist(variablename) to plot the histogram. Replace variablename with the
name of the variable you generated to store the predicted values of approve.)
Time Series
4 (10 marks). Consider a simple model of a time series yt as a function of its past (using lagged
values):
3
(1)
Assume that
is what we refer to as `stationary’ – its distribution does not change over time, i.e.
E[] = for all t.
i Interpret the model – what does 1 capture?
ii Now consider the model including an additional variable
:
(2)
Find and explain what this means for the interpretation of the effect of xt.on . Is
the marginal effect of on the expected value of
still equal to
iii Consider again the model in (1). Suppose you are interested in forecasting
for periods t+1,
t+2,…. Show that the predicted deviation of yt from its expected value in period t+2 is
{Hint: first show that (1) can be written in deviations from equilibrium as
. Then consider the predicted deviation at t+1, t+2, t+3.... and use
substitution to show the required result.}
iv Suppose |???1| < 1. What does your result from iii) tell you about the predicted deviation from
the model’s estimated equilibrium as the forecasting period increases?
5. (10 marks) Use the data set “consump” for this question.
i Estimate a simple regression model relating the logarithm of real per capita consumption
(log(c)) to the logarithm of real per capita disposable income (log(y)). Report the results in the
usual form (including the standard deviations, the number of observations and the values of R2 and R). Interpret the equation and discuss statistical significance.
ii Add a lag of the logarithm of real per capita disposable income to the equation from part (i).
Report the results in the usual form. What do you conclude about adjustment lags in
consumption growth? (Hint: you can use “lylag<-log(dplyr::lag(consump$y,k=1))” to
generate a lag of log(y) and then include lylag in the regression.)
iii Add the real interest rate (r3) to the equation in part (i). Report the results in the usual form.
Does r3 affect consumption growth?
iv If we plot the time series of the real consumption, the trend went flat from year 1979 to 1982
before going up again. What is the estimated model if we also include a dummy variable
equal to 1 for years after 1979? Report the results in the usual form and comment.
4
6. Forecasting Competition (55%)
This part of the assignment will require substantial studying on your own as we have not covered much
of forecasting in lectures, however, this task will be highly relevant for real-world application of
econometrics. Each of you must provide your own write-up of your estimation practise and competition
report. Lab 8 and 10 provide the syntax and examples of the code that will be used in this project.
We will have a forecasting competition where each of you (or teams) will create forecasts of the
number of cyclists on the Galloping Goose Trail in Victoria (http://www.ecopublic.com/public2/?id=100117730).
The city makes daily cyclists counts publicly available and you
will build a forecasting model and subsequently predict the number of cyclists on the trail. There will
be a prize for the best (most accurate) forecasts.
The following tasks in part 1) will walk you through creating a simple forecasting model. You can then
build your own model and refine your predictions in part 2).
6.1) Estimation and Practise (20%)
i) Download the estimation data from CourseSpaces (“goose_a4.csv”) on the number of cyclists on
the Galloping Goose. Plot the variable ‘bikes_count’ as a line plot with time on the x-axis.
ii) Estimate a simple regression model, modelling the number of cyclists as a function of weekend and
month dummy variables, and report the results in a well-formatted table.
iii) What can you tell from the results in part (ii)?
iv) On May 28th, 2018 the new cycle lane across the bridge in Victoria was opened. Construct a dummy
variable that is equal to one following the opening of this cycle lane, and zero otherwise. Estimate and
interpret the effect of the opening of this cycle lane on the number of daily cyclists.
v) Create so-called `hind-casts’ – by estimating your model up until the end of the estimation sample
(Dec. 21st, 2018), and then predict the number of cyclists for the next h=10 periods. Discuss and
illustrate how your forecasts compare to the observed numbers of cyclists which are shown in column
`bikes_pred’ in the dataset?
vi) Compute the root-mean-squared forecast error of your forecasts:
Do this by:
a) Computing the square of the difference between observed, bikes_pred) and your
predicted number of cyclists,.
b) Computing the mean of the sum of the squares from part (a), and then taking the square
root.
vii) How does the root-mean-squared forecast error of this model compare to one that omits the
weekend dummy variables?
5
6.2) Forecasting Competition (35%)
i) Download the forecasting data: “goose_competition.csv” from CourseSpaces. It includes daily
observations of the number of cyclists up until July 11
th, 2019.
ii) Think of a general model of the number of cyclists on the Galloping Goose (variable `bikes’ in the
dataset). What regressors you would include (these would be observable and perhaps unobservable
factors that affect the number of cyclists each day, such as days of the week, month of the year, etc.)
Discuss how you could estimate your model from part, which variables are available, what are the
limitations of data availability?
ii) Estimate and describe your best prediction model for the number of cyclists on the Galloping
Goose, and predict the number of cyclists 20 days into the future (up until and including July 31st ) –
the most accurate forecasts will win.
This model could include trends, dummies for weekdays, interaction terms, autoregressive lags, etc.,
perhaps you only want to estimate the model on a shorter sample, or use the full sample. It’s up to
you! There are quite a few variables already in the dataset, but you can also build your own. The latest
date for the actual time-series data used in your forecast model is July 11th, 2019. If you (and your
team) used any actual time-series data dated after July 11th, 2019 (estimated data and the trend and
dummy variables generated based on date are allowed) in your forecast, your forecast would be
disqualified from the competition.
Plot your forecasts and describe your forecasting model carefully.
iii) Upload three files to CourseSpaces (under `Entry for Forecasting Competition’):
(1) Your data file as a “.csv” file. If you worked in a team, save the csv file as “team_name_data.csv”
replacing “team_name” with the name you have come up with (keep it simple please). Otherwise save
the file as “firstname_lastname_data.csv”, replacing “firstname” with your first name and “lastname”
with your last name.
(2) Your forecast results as a “.csv” file. The forecasts must be in a csv file and must take the form
shown below – the first column has to be the date (labelled as date), the second column the number of
predicted cyclists (labelled cyclists) from 2019-07-12 to 2019-07-31, and the third column must contain
all V-numbers of you and your team members. Similarly, if you worked in a team, save the csv file as
“team_name.csv” replacing “team_name” with the name you have come up with (keep it simple please).
Otherwise save the file as “firstname_lastname.csv”, replacing “firstname” with your first name and
“lastname” with your last name.
(3) Your R Markdown file. The R Markdown file should be up and running without error message with
your uploaded csv data file. Again, if you worked in a team, save the Rmd file as “team_name.Rmd”
replacing “team_name” with the name you have come up with (keep it simple please). Otherwise save
the file as “firstname_lastname.Rmd”, replacing “firstname” with your first name and “lastname” with
your last name.
If you worked in a team, please upload one Rmd file, one csv data file and one csv result file for each
team. Again, each of you must submit your own write-up of your report.
The most accurate forecast will be awarded a prize, but forecast accuracy will not affect your assignment
grade. Forecast accuracy will be judged by the lowest RMSE over July 12th
-July 31st
. Good luck!
Format of csv file:
date cyclists v_numbers
2019-07-12 254 (your prediction here) “V...”
6
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。