MAST30025: Linear Statistical Models
Assignment 2, 2019
Due: 5pm Friday, May 3 (week 8)
This assignment is worth 7% of your total mark.
You may use R for this assignment, including the lm function unless specified. If you do,
include your R commands and output.
Your assignment must be submitted to Turnitin on the LMS as a single PDF document
only. You may choose to either typeset your assignment or handwrite and scan it to
produce an electronic version. Turnitin will not accept late submissions.
Turnitin gives you an option to preview your work prior to submission. Please check this
preview carefully to ensure you are submitting the correct document. After a successful
submission to Turnitin, you will see a submission ID. This confirmation will also be sent
to your University email address. If you do not see a submission ID, you should assume
that your assignment has not been submitted successfully. Either try to submit again or
contact the tutor co-ordinator (Rheanna Mainzer) immediately to arrange an alternate
means of submission. Issues with Turnitin are not a valid excuse for submitting a late
assignment or an incorrect version of an assignment.
(1 mark) Your assignment must clearly show your name and student ID number, your
tutor’s name and the time and day of your tutorial class. Your assignment must be
submitted in the correct format and the correct orientation. Your answers must be
clearly numbered and in the same order as the assignment questions.
1. Prove Theorem 4.8: show that the maximum likelihood estimator of the error variance. An experiment is conducted to estimate the annual demand for cars, based on their cost, the
current unemployment rate, and the current interest rate. A survey is conducted and the following
measurements obtained:
Cars sold (×103
) Cost ($k) Unemployment rate (%) Interest rate (%)
5.5 7.2 8.7 5.5
5.9 10.0 9.4 4.4
6.5 9.0 10.0 4.0
5.9 5.5 9.0 7.0
8.0 9.0 12.0 5.0
9.0 9.8 11.0 6.2
10.0 14.5 12.0 5.8
10.8 8.0 13.7 3.9
For this question, you may NOT use the lm function in R.
(a) Fit a linear model to the data and estimate the parameters and variance.
(b) Which two of the parameters have the highest (in magnitude) covariance in their estimators?
(c) Find a 99% confidence interval for the average number of $8, 000 cars sold in a year which has
unemployment rate 9% and interest rate 5%.
1
(d) A prediction interval for the number of cars sold in such a year is calculated to be (4012, 7087).
Find the confidence level used.
(e) Test for model relevance using a corrected sum of squares.
3. Consider two full rank linear models y = X1γ1 + ε1 and y = Xβ + ε2, where all predictors in
the first model (γ1
) are also contained in the second model (β). Show that the SSRes for the first
model is at least the SSRes for the second model.
4. In this question, we study a dataset of 50 US states. This dataset contains the variables:
Population: population estimate as of July 1, 1975
Income: per capita income (1974)
Illiteracy: illiteracy (1970, percent of population)
Life.Exp: life expectancy in years (1969–71)
Murder: murder and non-negligent manslaughter rate per 100,000 population (1976)
HS.Grad: percentage of high-school graduates (1970)
Frost: mean number of days with minimum temperature below freezing (1931–1960) in capital
or large city
Area: land area in square miles
The dataset is distributed with R. Open it with the following commands:
> data(state)
> statedata <- data.frame(state.x77, row.names=state.abb, check.names=TRUE)
We wish to use a linear model to model the murder rate in terms of the other variables.
(a) Plot the data and comment. Should we consider any variable transformations?
(b) Perform model selection using forward selection, using all variable transformations which may
be relevant.
(c) Starting from the full model, perform model selection using stepwise selection with the AIC.
(d) Write down your final fitted model (including any variable transformations used).
(e) Produce diagnostic plots for your final model and comment.
5. For ridge regression, we choose parameter estimators b which minimise
where λ is a constant penalty parameter.
(a) Show that these estimators are given by
b = (XT X + λI)1XT y.
(b) Calculate the ridge regression estimates for the data from Q2 with penalty parameter λ = 0.5.
In order to avoid penalising some parameters unfairly, we must first scale every predictor
variable so that it is standardised (mean 0, variance 1), and centre the response variable
(mean 0), in which case an intercept parameter is not used. (Hint: This can be done with the
scale function).
2
(c) One way to calculate the optimal value for the penalty parameter is to minimise the AIC.
Since the number of parameters p does not change, we use a slightly modified version:
AIC = n ln SSResn+ 2 df,
where df is the “effective degrees of freedom” defined by
df = tr(H) = tr(X(XT X + λI)1XT).
For the data from Q2, construct a plot of λ against AIC. Thereby find the optimal value for
λ.
3
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。