Final Exam for Statistical Learning
Summer 2019
Name____________ Student ID#_____________
Part I. (10 pts) True/False? In the following problems, determine whether the statement is true or
false. If it is false, please correct it. 1. In the ANOVA table for a linear regression model, the F statistic checks the
significance of the model. The F statistic follows the F distribution with
degrees of freedom K and n-K-1, where n is the sample size and K equals
the number of independent variables. _______
2. An application of the linear regression model with an intercept and 9
independent variables generated the following results involving the F test of the
overall regression model (in the ANOVA table): p - value = .03, R2 = .67, s = .076. Thus, the null hypothesis, , should be rejected
at the .05 level of significance. ______ Part II. (10 pts) Multiple choice questions. There is only one best answer among
alternatives. You must choose the best answer to each of the questions and circle it. 1. In a multiple regression model with a large sample of size 200 and 4
independent variables, , one likes to investigate if and
are useful for explaining the remaining variation of the response
after taking into account the effect of and , one uses F-statistic
to test against For the testing
problem in problem 1, what are degrees of freedom of the test statistic?____
a. 2, 196 c. 2, 195
b. 4, 196 d. 4, 195
c. 2, 200 e. 1, 195
2. For comparison of any two linear regression models (Model 1 and Model 2), the
following regression outputs are obtained:
(i) For Model 1: , F=13.5
(ii) For Model 2:
Then which of the followings is true? _______
a. Model 1 is more flexible than Model 2. b. Model 2 is more flexible than Model 1.
c. No enough evidence to decide which model is more flexible. Part III. (80 pts)
2. Bootstrap (25 pts). Given an iid random sample from the regression model , one uses
the KNN with K=5 to estimate the mean function . Let be the resulting estimator. How to use the bootstrap method to get a 95% confidence interval of ? Write down the
algorithm for calculating the interval estimate.
3. (25 pts) Solve the following problems:
4. Explain each of the following R codes (30 pts)
1. set.seed(1)
2. p = 20
3. n = 1000
4. x = matrix(rnorm(n * p), n, p)
5. B = rnorm(p)
6. B[3] = 0
7. B[4] = 0
8. B[9] = 0
9. B[19] = 0
10. B[10] = 0
11. eps = rnorm(p)
12. y = x %*% B + eps
13. train = sample(seq(1000), 100, replace = FALSE)
14. y.train = y[train, ]
15. y.test = y[-train, ]
16. x.train = x[train, ]
17. x.test = x[-train, ]
18. library(leaps)
19. regfit.full = regsubsets(y ~ ., data = data.frame(x =
x.train, y = y.train),
20. nvmax = p)
21. val.errors = rep(NA, p)
22. x_cols = colnames(x, do.NULL = FALSE, prefix = "x.")
23. for (i in 1:p) {
24. coefi = coef(regfit.full, id = i)
25. pred = as.matrix(x.train[, x_cols %in% names(coefi)]) %*%
coefi[names(coefi), x_cols]
26. val.errors[i] = mean((y.train - pred)^2)
27. }
28. plot(val.errors, ylab = "Training MSE", pch = 19, type =
"b")
(10 pts)
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。