联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-05-07 11:12

MAST30025: Linear Statistical Models

Assignment 2, 2019

Due: 5pm Friday, May 3 (week 8)

This assignment is worth 7% of your total mark.

You may use R for this assignment, including the lm function unless specified. If you do,

include your R commands and output.

Your assignment must be submitted to Turnitin on the LMS as a single PDF document

only. You may choose to either typeset your assignment or handwrite and scan it to

produce an electronic version. Turnitin will not accept late submissions.

Turnitin gives you an option to preview your work prior to submission. Please check this

preview carefully to ensure you are submitting the correct document. After a successful

submission to Turnitin, you will see a submission ID. This confirmation will also be sent

to your University email address. If you do not see a submission ID, you should assume

that your assignment has not been submitted successfully. Either try to submit again or

contact the tutor co-ordinator (Rheanna Mainzer) immediately to arrange an alternate

means of submission. Issues with Turnitin are not a valid excuse for submitting a late

assignment or an incorrect version of an assignment.

(1 mark) Your assignment must clearly show your name and student ID number, your

tutor’s name and the time and day of your tutorial class. Your assignment must be

submitted in the correct format and the correct orientation. Your answers must be

clearly numbered and in the same order as the assignment questions.

1. Prove Theorem 4.8: show that the maximum likelihood estimator of the error variance. An experiment is conducted to estimate the annual demand for cars, based on their cost, the

current unemployment rate, and the current interest rate. A survey is conducted and the following

measurements obtained:

Cars sold (×103

) Cost ($k) Unemployment rate (%) Interest rate (%)

5.5 7.2 8.7 5.5

5.9 10.0 9.4 4.4

6.5 9.0 10.0 4.0

5.9 5.5 9.0 7.0

8.0 9.0 12.0 5.0

9.0 9.8 11.0 6.2

10.0 14.5 12.0 5.8

10.8 8.0 13.7 3.9

For this question, you may NOT use the lm function in R.

(a) Fit a linear model to the data and estimate the parameters and variance.

(b) Which two of the parameters have the highest (in magnitude) covariance in their estimators?

(c) Find a 99% confidence interval for the average number of $8, 000 cars sold in a year which has

unemployment rate 9% and interest rate 5%.

1

(d) A prediction interval for the number of cars sold in such a year is calculated to be (4012, 7087).

Find the confidence level used.

(e) Test for model relevance using a corrected sum of squares.

3. Consider two full rank linear models y = X1γ1 + ε1 and y = Xβ + ε2, where all predictors in

the first model (γ1

) are also contained in the second model (β). Show that the SSRes for the first

model is at least the SSRes for the second model.

4. In this question, we study a dataset of 50 US states. This dataset contains the variables:

Population: population estimate as of July 1, 1975

Income: per capita income (1974)

Illiteracy: illiteracy (1970, percent of population)

Life.Exp: life expectancy in years (1969–71)

Murder: murder and non-negligent manslaughter rate per 100,000 population (1976)

HS.Grad: percentage of high-school graduates (1970)

Frost: mean number of days with minimum temperature below freezing (1931–1960) in capital

or large city

Area: land area in square miles

The dataset is distributed with R. Open it with the following commands:

> data(state)

> statedata <- data.frame(state.x77, row.names=state.abb, check.names=TRUE)

We wish to use a linear model to model the murder rate in terms of the other variables.

(a) Plot the data and comment. Should we consider any variable transformations?

(b) Perform model selection using forward selection, using all variable transformations which may

be relevant.

(c) Starting from the full model, perform model selection using stepwise selection with the AIC.

(d) Write down your final fitted model (including any variable transformations used).

(e) Produce diagnostic plots for your final model and comment.

5. For ridge regression, we choose parameter estimators b which minimise

where λ is a constant penalty parameter.

(a) Show that these estimators are given by

b = (XT X + λI)1XT y.

(b) Calculate the ridge regression estimates for the data from Q2 with penalty parameter λ = 0.5.

In order to avoid penalising some parameters unfairly, we must first scale every predictor

variable so that it is standardised (mean 0, variance 1), and centre the response variable

(mean 0), in which case an intercept parameter is not used. (Hint: This can be done with the

scale function).

2

(c) One way to calculate the optimal value for the penalty parameter is to minimise the AIC.

Since the number of parameters p does not change, we use a slightly modified version:

AIC = n ln SSResn+ 2 df,

where df is the “effective degrees of freedom” defined by

df = tr(H) = tr(X(XT X + λI)1XT).

For the data from Q2, construct a plot of λ against AIC. Thereby find the optimal value for

λ.

3


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp