Intructions and Recommendations:
Please start your document with the following statement: “On my honor, I have not had any form of
communication about this exam with any other individual (including other students, teaching assistants,
instructors, etc.)”.
Your answers should be presented in the first 2 pages of your submission paper. You are strongly
encouraged to have succinct answers. Lengthy answers are neither required nor desired (and they are
prone to contain incorrect statements that would be penalized).
There is no unique correct solution and is unlikely that two students who haven’t work together have
the exact intermediate steps and final solution. As a consequence, your grade is not based on the final
solution, it is based on
– a proper and adequate use of the techniques/methods learned in class.
– appropriate arguments made when making decisions, whether or not those decisions required the
use of statistical methods.
– Appropriate supporting material (syntax/charts/output and any lengthy explanations after the
first two pages) that support your work and conclusions. There is no limit of space for supporting
material.
You are advice to avoid methods not covered in class. If however, you believe this is necessary (because
for example, that would lead you to select a very different model that is clearly more appropriate), then
you need to attach somewhere on your document a small explanation of the external method used and
provide meaningful references (i.e., papers, manuscripts, or textbooks are valid references; Wikipedia or
stackoverflow are not).
Submit your solutions to Canvas as a single PDF file by Monday, December 10th. Do not wait until
the very last minute since late submissions may not be graded.
1
Questions
Use the data BGSall These are the research questions we want to answer:
a. Could information about weight and height from individual’s earlier years help determine their body
mass index at age 18, BMI18?
b. Assume that the data is representative of a given society. Assume that we are interested in avoid
problem with obesity for the adult population but funding could be provided to improve nutrition in
only one of the following groups in that society: i) infants up to 2 years of age, ii) children older than 2
but younger than 9, or (iii) children older than 9 but younger than 18. Is there evidence, based on
these data only, that it would be better to provide funding at earlier stages (i), intermediate stages (ii),
or later stages (iii)?
1. Use help(BGSall) to get familiar with the data at hand.
Why wouldn’t be appropriate to use height at 18, HT18, and weight at 18, WT18, to study this
problem Explain.
Would it be appropriate to separate boy and girls in two different data sets, or deal with them
jointly including a factor to account for this difference? Justify your answer and if the former,
answer the following questions twice (one for each group).
Is there any non-statistical argument based on the research questions or your own knowledge that
you could use to select a subset of regressors from your entire data set? If so, use it to simplify the
set of predictors before your start your data analysis. If not, use all the predictors in the data set.
2. Use graphical methods and statistical techniques learned in class to study your data and come up with
a final model. Show your proficiency checking for transformations, interactions, polynomials, etc., even
if it’s clear you may not use some of them later on.
3. Based on your work in question 2, come up with an “almost final” full model, and use any valid
arguments (statistical or non-statistical) to compare it with one (or few) reduced model(s), and conclude
with a candidate model.
4. Using your final candidate model, check to see if there are any clear violations to the assumptions of
the model. Also check if there is evidence of collinearity. Make appropriate corrections, if any, and
briefly interpret key components of the summary output of your model.
5. Based on your final model in question 4, answer research questions (a) and (b). Question (a) is straight
forward. For question (b) choose only one of the following options:
Your results help answer this question. Explain how.
Your results are not enough to answer this question but with a few additional considerations you
could use them. Give an example what those considerations could be and provide an answer based
on them and your model.
Your results are not useful at all to answer this question. Explain why.
6. Based on your final model in question 4, perform an influence analysis, in particular
using the Cook’s distance, determine which observations are the most influential (select up to
four),
determine which observations, if any, could be considered outliers,
compare the coefficient estimates obtained with and without the selected influential observations
and briefly comment on your results.
is there any conclusions that you would change in questions 5 due to the results obtained here
Explain.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。