2.6. EXERCISES 81
Exercises
iConceptual Exercises
2.1- 2.7 True or False? Each of the statements in Exercises 2.1- 2.7 is either true or false.
For each statement, indicate whether it is true or false and, if it is false, give a reason.
2.1 If dataset A has a larger correlation between Y and X than dataset B, then the slope between
Y and X for dataset A will be larger than the slope between Y and X for dataset B?
2.2 The degrees of freedom for the model is always 1 for a simple linear regression model.
, 2.3 The magnitude of the critical value (t*) used to compute a confidence interval for the slope
of a regression model decreases as the sample size increases.
2.4 The variability due to error (SSE) is always smaller than the variation explained by the
model (SSA1odel).
2.5 If the size of the typical error increases, then the prediction interval for a new observation
becomes narrower.
2.6 For the same value of the predictor, the 95% prediction interval for a new observation is
always wider than the 95% confidence interval for the mean response.
2. 7 If the correlation between X 1 and Y is greater (in magnitude) than the correlation between X2
and Y , t hen the coefficient of determination for regressing Y on X 1 is greater than the coefficient
of determination for regressing Yon X2.
2.8 Using correlation. A regression equation was fit to a set of data for which the correlation,
r. between X and Y was 0.6. Which of the following must be true?
a. The slope of the regression line is 0.6.
b. The regression model explains 60% of the variability in Y.
c. The regression model explains 36% of the variability in Y.
d. At least half of the residuals are smaller than 0.6 in absolute value.
2.9 Interpreting the size of r 2 •
a . Does a high value of r 2, say, 0.90 or 0.95, indicate that a linear relationship is the best possible
model for the data? Explain.
b. Does a low value of r 2, say, 0.20 or 0.30, indicate that some relationship other than linear
would be the best model for the data? Explain.
3.9. EXERCISES 14!)
3.9 Exercises
Conceptual Exercises
3.1 Predicting a statistics final exam grade. A st.at.i:;tics professor assigned various grades
during the scmestff indudi11g a midtl'rm exam (out of 100 points) and a logistic regrcssiou project
(out of 30 points). The prcdictiou equation below was lit, using da tn fr0111 24 students in the clru;s,
to predict t he fiual exam score (out of 100 poillts) bm;cd Oil t.hl' midterm aud project gra<lcs:
F~l = l 1.0 + 0.53 · l\lidl er·111 + 1.20 · Projecl.
a. What would this tell you about a studcut who got perfect ::;cores Oil the midterm and project?
b. ~lichael got a grade of 87 Oil his midterm, 21 on the project, and an 80 on the final. Compute
his rC'sidual Hild write a sentence to explain what that value means in Michael's case.
3.2 Predicting a statistics final exam grade (continued). Does t he prediction equation for
final exam scores in Exercise 3.1 suggest that the project score has a stronger relationship wit h the
final exam than the midt erm exam? Explain why or why not.
3.3 Breakfast cereals. A regression model was fit to a sample of breakfast cereals. The response
rnriable Y is calories per serving. The predictor variables are X 1, grams of sugar per serving, a nd
X2. grams of fiber per serving. The fitted regression model is
Y = 109. 3 + 1.0 · X 1 - 3. 7 · X 2
In the context of this setting, interpret - 3.7, the coefficient of X2. That is. describe how fiber is
related to calories per serving, in the presence of the sugar variable.
3.4 Adjusting R2 • Decide if the following statements are true or fabc. and explain why:
a. For a multiple regression problem, the adjusted coefficient of determination, R2 adj , will always
be smaller than t he regular, unadjusted R2.
b. If we fit a multiple regression model and t hen a.<ld a new predictor to t he model, the adjusted
coefficient of determination, R2 adj , will always increal:ie.
3.5 Body measurements. Suppose t hat you are interested in predicting the percentage of body
fat (BodyFat) 011 a man using t he explanatory variables waist size ( Waist) and Height.
a. Do you think that BodyFat and Waist are positively correlated? Explain.
b. For a fixed waist size (say, 38 inches), would you expect BodyFat to be positively or negatively
correlated with a man's Height? Explain why.
3.9. E,XE RCISES 151
c. Predict t he amount of titanium (Titanium) in a well based on a possible quadratic relationship
with the distance (Afi les ) from a mining 8ite.
d. Predict the amount of sulfide (Sulfide) in a well based on Y ear, distance (Miles) from a
mining site, depth (Depth) of the well, and any interactions of pairs of explanatory variables.
3 .9 Degrees of freedom for well water models. Suppose that the environmental expert in
Exercise 3.8 gives you data from 198 wells. Identify the degrees of freedom for error in each of the
models from the previous exercise.
3.10 Predicting faculty salaries. A dean at a small liberal arts college is interested in fitting
a multiple regression model to try to predict salaries for faculty members. If the residuals are
unusually large for any individual faculty member, then adjustments in the person's annual salary
are considered.
a . Identify the model for predicting Salary from age of the faculty member (Age) , years of
experience (Seniority), number of publications (Pub) , and an indicator variable for gender
(!Gender). The dean wants this initial model to include only pairwise interaction terms.
b. Do you think that Age and Seniority will be correlated? Explain.
c. Do you think that Seniority and Pub will be correlated? Explain.
d . Do you think that the dean will be happy if the coefficient for !Gender is significantly different
from zero? Explain.
Guided Exercises
3.11 Active pulse rates. The computer output below comes from a study to model Active
pulse rates ( after climbing several flights of stairs) based on resting pulse rate ( Rest in beats per
minute), weight ( Wgt in pounds), and amount of Exercise (in hours per week). The data were
obtained from 232 students taking Stat2 courses in past semesters.
The regression equation is Active= 11.8 + 1.12 Rest+ 0.0342 Wgt - 1.09 Exercise
Predictor Coef SE Coef T p
Constant 11.84 11.95 0.99 0.323
Rest 1.1194 0 .1192 9.39 0.000
Wgt 0.03420 0.03173 1.08 0.282
Exercise -1.085 1.600 -0.68 0.498
S = 15 . 0452 R-Sq = 36.9% R-Sq(adj) = 36.1%
1.)2 Clf A PT EH :J. ft.1ULTIPLE REGfl£s ,
.Sf<J,\
~ . 1 . 1 · t rpret the result in the conte t a. CRt t lP h~·pothesC':; that tJ = O ,·C'rs11s /h -=I=- 0 am m e . x of tb· . I l , 2 • • r . rm•ar rnockl are satisfied for th Ill
pro 1 cm. \ ou ma.Y assu111c• that thC' concl1t1011s ior •
1 1 ese data.
h . Construct an<l interpret a 90% confid<'nce interval for t he coefficient fh in this model.
\\·1 · . I . d. t c r H- •JOO-pound student who c · mt act1yc p 11lsc> rate wo uld t 111s mode pie ic 10 < - exercises 7 hours PC'r week a nd ha~ a rest ing pube rate of 7G bC'ats per minute?
3.12 Major Le.ague Baseball winning pe rcentage. Ju Example 3.1, we considered a model£
t hC' winning p<'rcentagcs of football tPams based on measures of ~ffen~iv~ (PointsFor) and defensi~:
(PomtsAgainst) abilit~-. The tile MLB2007Standings contams s1m1lar data _on _many variables
for I\1ajor League Oa::;C'ball (~ILB) teams from the 2007 regular season. The wmnmg percentages
a r0 in thP \-aria bl<> H ·inPct und scoring varia bles include R-uns (scored by a team for the season)
a nd ER.4 (essentially the aYerage nms against a team per game).
a . Fit a multiple regression model to predict WinPct based on Runs and ERA. Write down the
pr<:"diction equation.
b. Tiu-- Boston Red Sox had a winning percentage of 0.593 for the 2007 season. They scored 867
runs and had an ERA of 3.87. Use this information and the fitted model to find the residual
for the Red Sox.
c. Comment on the effectiveness of each of the two predictors in this model. vVould you recommend
dropping one or the other (or both) from the model? Explain why or why not.
d . Does this model for team winning percentages in baseball appear to be more or less effertiw
than the model for footba ll t eams in Example 3.1 on page 95? Give a numerical justification
for your answer.
3 . 13 E nrollm ents in mathematics courses. In Exercise 2.23 on page 85. we consider a model
to predict spring enrollment in mathematics courses based on the fall enrollment. The residuals for
that model showed a pattern of growing over the years in the data. Thus, it might be beneficial
to add t h e academic year variable AY ear to our model and fit a multiple regression. The data are
pro\·ided in the file MathEnrollment.
a . Fit a multiple regression model for predicting spring enrollment (Spring ) from fall enrollment
(Fall) a nd academic year (AYear), after removing the data from 2003 that had special
circumstances. Report t he fitted prediction equat ion.
b . Prep are a ppro priate residual plots and comment on the conditions for inference. Did the
slig ht problems with the residual plots (e.g., increasing residuals over time) that we noticed
for t he simple linear model disappear?
1
3 !) F.\:FllCISES 153
S. l.t Enmllments in mathematics courses (continued). Refer to the model in Exercise 3.13
t<1 pn•dirt Spring mathematics enrollments with a two-predictor model based 011 Fall enrollments
8 11d aca<lPmic year (AYear) for the data in MathEnrollment.
a. \\'hat percent of the variability in spring enrollment is explai11ed by the multiple regression
model based on fall enrollment and academic year?
b. What is the size of the typical error for thh, multiple regm,sion model?
c. Provide the ANOVA table for partitioning the total variability in spring enrollment based on
this model and interpret the associated F-test.
d. Are the regression coefficients for both explanatory variables significantly different from zero?
ProYide appropriate hypotheses, test statistics, and p-values in order to make your conclusion.
3.15 More breakfast cereal. The regression model in Exercise 3.3 was fit to a sample of 36
breakfast cereals ,vith calories per serving as the response variable. The two predictors were grams
of sugar per serving and grams of fiber per serving. The partition of the sums of squares for this
model is
SSTotal
17190
SSModel + SSE
9350 + 7840
a. Calculate R2 for this model and interpret the value in the context of this setting.
b. Calculate the regression standard error of this multiple regression model.
c. Calculate the F-ratio for testing the null hypothesis that neither sugar nor fiber is related to
the calorie content of cereals.
d. Assuming the regression conditions hold, the p-value for the F-ratio in (c) is about 0.000002.
Interpret what this tells you about the variables in this situation.
3.16 Combining explanatory variables. Suppose that X1 and X2 are positively related with
X 1 = 2X2 - 4. Let Y = 0.5X1 + 5 summarize a positive linear relationship between Y and X1.
a. Substitute the first equation into the second to show a linear relationship between Y and X2.
Comment on the direction of the association between Y and X2 in the new equation.
L. Now add the original two equations and rearrange terms to give an equation in the form
Y = aX1 + bX2 + c. Are the coefficients of X1 and X2 both in the direction you would
expect based on the signs in the separate equations? Combining explanatory variables that
are related to each other can produce surprising results.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。