联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2018-11-02 10:44

Statistics 512: Problem Set 4

Due Thursday, November 1, 2018 11:59 PM

Important Note – Every graph or plot you create should have your name printed as a subtitle.

Consequently, any graph with no name will result in a 20% points loss. Also, please attach your

code at the end; any homework with no code provided will result in a 50% points loss on the entire

assignment.

1. Based on the following small data set, construct the design matrix, X, its transpose X0

, and

the matrices X0X, (X0X)1

, X0Y, and b = (X0X)1X0Y.

X Y

0 1

1 4

2 7

3 9

4 10

5 8

For the following 5 problems, consider the data given in the file CH06PR18.DAT,

which describes a data set (n = 24) used to evaluate the relation between intermediate

and senior level annual salaries of bachelor’s and master’s level mathematicians

(Y , in thousand dollars) and an index of work quality (X1), number of

years of experience (X2), and an index of publication success (X3).

2. Run the multiple linear regression with quality, experience, and publications as the explanatory

variables and salary as the response variable. Summarize the regression results by giving

the fitted regression equation, the value of R2

, and the results of the significance test for the

null hypothesis that the three regression coefficients for the explanatory variables are all zero

(give null and alternative hypotheses, test statistic with degrees of freedom, p-value, and brief

conclusion in words).

3. Give 85% confidence intervals for regression coefficients of quality, experience, and publications

based on the multiple regression. Describe the results of the hypothesis tests for the

individual regression coefficients (give null and alternative hypotheses, test statistic with degrees

of freedom, p-value, and a brief conclusion in words). What is the relationship between

these results and the confidence intervals?

4. Plot the residuals versus the predicted salary and each of the explanatory variables (i.e., 4

residual plots). Are there any unusual patterns?

5. Examine the assumption of normality for the residuals using a qqplot and histogram. State

your conclusions.

6. Predict the salary for a mathematician with quality index equal to 5.2, 15 years of experience,

and publication index equal to 6.5 . Provide a 85% prediction interval with your prediction.

1

For the following problems use the computer science data that we have been discussing

in class. You can get a copy of the data set csdata.dat from the class website.

The variables are: id, a numerical identifier for each student; GPA, the grade point

average after three semesters; HSM; HSS; HSE; SATM; SATV, which were all explained in

class; and GENDER, coded as 1 for men and 2 for women.

7. In this exercise you will illustrate some of the ideas related to the extra sums of squares.

(a) Create a new variable called SAT which equals SATM + SATV and run the following two

regressions:

i. predict GPA using HSM, HSS, and HSE;

ii. predict GPA using SAT, HSM, HSS, and HSE.

Calculate the extra sum of squares for the comparison of these two analyses. Use it to

construct the F-statistic – in other words, the general linear test statistic – for testing

the null hypothesis that the coefficient of the SAT variable is zero in the model with all

four predictors. What are the degrees of freedom for this test statistic?

(b) Compare the test statistic and p-value from the test statement with the individual t-test

for the coefficient of the SAT variable in the full model. Explain the relationship.

8. Run the regression to predict GPA using SATM, SATV, HSM, HSE, and HSS. Put the variables in

the order given above and calculate the Type I and Type II SS using R.

(a) Add the Type I sums of squares for the five predictor variables. Do the same for the

Type II sums of squares. Do either of these sum to the model sum of squares? Are there

any predictors for which the two sums of squares (Type I and Type II) are the same?

Explain why.

(b) Verify (by running additional regressions and doing some arithmetic with the results)

that the Type I sum of squares for the variable SATV is the difference in the model sum

of squares (or error sum of squares) for the following two analyses:

i. predict GPA using SATM, SATV;

ii. predict GPA using SATM.

9. Create an additional variable called HS that is the sum of the three high school scores (HSE +

HSS + HSM). Run the regression to predict GPA using a variety of variables, including HS and

SAT, as described below. Summarize the results by making a table giving the percentage of

variation explained (R2

) by each of the following models:

(a) SATM as the explanatory variable

(b) SATV as the explanatory variable

(c) HSM as the explanatory variable

(d) HSS as the explanatory variable

(e) HSE as the explanatory variable

(f) SATM and SATV as the explanatory variables

(g) SAT (=SATM+SATV) as the explanatory variable

(h) HSM, HSS, and HSE as the explanatory variables

(i) HS (=HSM+HSS+HSE) as the explanatory variable

2

(j) SATM, SATV, HSM, HSS, and HSE as the explanatory variables

(k) SAT and HS as the explanatory variables

(Please do not include the R output for all these models. Only the R2 value is needed. Note

that you can run copy and paste some lm codes in R to save typing.)

10. A data set contains 50 observations. There are 4 explanatory variables: A, B, C, and D. Use

the following results:

1.3972 1.8892 × 103 3.6060 × 103 1.3523 × 10?3 ?2.9728 × 10?2

1.8892 × 103 5.0363 × 105 6.8773 × 106 3.9875 × 106 5.0387 × 106

3.6060 × 103 6.8773 × 106 4.9685 × 105 8.6113 × 106 5.4578 × 105

1.3523 × 103 3.9857 × 106 8.6113 × 106 4.7933 × 105 1.3931 × 105

2.9728 × 102 5.0387 × 106 5.4578 × 105 1.3931 × 105 8.0975 × 104

469.7658

2.4148

3.3341

4.3285

MSE = 1963.48714

(a) Obtain a 85% confidence interval for β1 (the coefficient for A).

(b) You wish to test H0 : β4 = 0 vs. Ha : β4 6= 0. That is, you wish to determine if variable

D provides significant power for Y when variables A, B, and C are already in the model.

Obtain the test statistic for this hypothesis test and determine if you would accept or

reject the null hypothesis (α = 0.05). You should give either a critical value or a p-value

to support your conclusion.

(c) Obtain a 85% confidence interval for the mean (expected) response when A = 40, B = 20,

C = 50, and D = 30.

(d) Obtain a 85% prediction interval for a single response when A = 40, B = 20, C = 50,

and D = 30.

11. In R, create a new variable GENDERW that has values 1 for women and 0 for men (use arithmetic

on the original variable GENDER). Run a regression to predict GPA using the explanatory

variables HSM, HSS, HSE, SATM, SATV, and GENDERW. (Do not include any interaction terms.)

(a) Give the equation of the fitted regression line using all six explanatory variables.

(b) Give the fitted regression line for women (use part a).

(c) Give the fitted regression line for men (use part a).

DO NOT attempt to run lm function on a subset of the data to answer this question.

12. Use the Cp criterion to select the best subset of variables for this problem. Use only the

original six explanatory variables, not HS or SAT, and use either GENDER or GENDERW, not

both. Summarize the results and explain your choice of the best model.

13. Check the assumptions of this “best” model using all the usual plots (you know what they

are by now). Explain in detail whether or not each assumption appears to be substantially

violated.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp