联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2023-03-24 09:31

STAT802 – Assignment 1, Part A. 1


STAT802: Advanced Topics in Analytics - Semester 1 2023

STAT802 Assignment 1 – Part A Due: 5pm on Friday 24 March 2023

Outline: Assignment 1 – Part A comprises three questions worth 15% of your final grade.

Total: 50 marks.


Only documents in portable format (pdf) will be accepted. You can use, e.g., Word, knitr

or Sweave to create your report, as well as R Studio as editor of the source files.

Formats other than PDF will be ignored and the author will be asked to re–submit the

assignment within 24 hours after the due date & time at the cost of 5% of the total marks.

If the assignment is not resubmitted within this time frame, then it will be assigned a mark

of zero and deemed as non–submission.

Any SAS code required to complete this assignment, especially the code to support your

conclusions & answers, must be self-explanatory and must be embedded in the correspond-

ing answer as text (not image). SAS code submitted in separate files will be ignored and

not considered for marking.

Optionally, you may submit only your answers and avoid copying & pasting each question

in the PDF document. If this is the case, then just make reference to each question, e.g.,

Answer Question 1 (a), Answer Question 1 (b), ... , etc.

Read carefully – Answer all the questions as requested. Any material or information

unrelated to the correct answer may result in a significant reduction of marks for that

question.

Several questions will come to light while solving these tasks. You may need to visit

the SAS–support website for additional information about specific statements/steps to

complete them.

Finally, fill in and sign the cover sheet which must be the very first page in the PDF. Use,

e.g., Adobe Acrobat Pro on Uni computers. Do not submit the cover sheet separately.

Finally, if you need an extension because your performance has been impacted by some exten-

uating, unexpected, circumstances, then you can submit and SCA along with relevant evidence

using the submission link from our STAT802 Home page. Bear in mind that SCA processing

may take up to 5 working days. If you have questions, contact victor.miranda@aut.ac.nz.

STAT802 – Assignment 1, Part A. 2

Question 1. The file binary.csv contains information of 400 students who applied to graduate

school last year. The file can be downloaded from the Week-2 Lab Canvas webpage.

There are four variables, as follows:

admit, which is equal to 1 if the individual was admitted to graduate school, and 0

otherwise,

gre, the student’s gre score when the application was submitted,

gpa, the student’s gpa when the application was submitted, and

rank, that takes on the values 1 through 4 and indicates the prestige of the Institution

the student obtained their bachelor’s degree. Institutions with a rank of 1 have the

highest prestige, while those with a rank of 4 have the lowest.

Using regression models, your manager (Cathy) is willing to explore gre, gpa, and institu-

tion rank as factors that may influence the chance of students to be admitted to graduate

school. Specifically, she believes that gpa has the highest influence on anticipating the

admission (and non-admission) of these students to graduate school. Cathy also believes

that the differences among the institution’s prestige in the chances of students ‘admitted’

and ‘not admitted’ differ based on the gre scores. Is your manager correct with both

assumptions? These results will be used in the next Executive Board meeting.

a) (1 marks - model + 4 marks - justification = 5marks) Propose and EXPLAIN

an appropriate modelling framework to deal with your manager’s concern. Name the

model (e.g., ordinary regression, logistic regression, etc.)

b) (3 marks) Write down the full (theoretical) model. Derive the reduced models, if

any. If no reduced models are to be considered, then write down a short paragraph

explaining this point.

c) You should by aware by now of the exceedingly large difference between the GPA and

GRE supports. While GPA ranges 1 to 5 points, GRE’s minimum is 220 units. Inter-

preting regression output with GRE or GPA as response and the other as predictor

may be hardly intuitive. Before going through d) - f), you are required to re-scale

GRE or GPA in a suitable and appropriate way. To complete this task, read the

following report:

https://scc.ms.unimelb.edu.au/resources/reporting-statistical-inference/

rescaling-explanatory-variables-in-linear-regression.

(5 marks) Write down 2-3 sentences outlining the approach you have adopted to deal

with this matter. Don’t go through d) - f) with this issues yet unresolved.

Hint: You can re-scale GRE to, say, ‘tens’ or ‘hundreds’.

STAT802 – Assignment 1, Part A. 3

d) (3 marks) Generate SAS code to estimate your model, AND appropriately address

any issue related to OVERDISPERSION, if any.

e) (6 marks) For the following students, your manager wants to know how likely (or

unlikely) is for them to be admitted to graduate school. See Slide 12 (predicted

probabilities) from the Week 2-Lecture Slides Part II deck!

Teresa: gre = 680, gpa = 3.5, and rank = 2.

Johanna: gre = 530, gpa = 4.18, and rank = 3.

Tim: gre = 600, gpa = 4.34, and rank = 4.

f) (8 marks) Write down an executive summary (avoid technical jargon). Focus on the

question Is your manager correct with both assumptions?. Include a short

discussion on Part d) - Overdispersion and Part e).

NOTE: Present output relevant to this question correctly cited and including captions

in an Appendix!

Question 2. The data set testScores.sas7bdat contains data from 200 high school students.

These are scores on various tests, including science, math, reading and social studies. The

variable female is coded as ‘1’ if the student was a female and 0 otherwise.

Your client claims (beyond all doubt) that the ‘math’ scores are a good predictor of the

student’s results in their ‘science’ test. Moreover, your client is convinced that they can

find segregated modelling frameworks for this purpose based on the variable female.

a) (0 marks) Run a PROC CONTENTS on this data set and carefully look at the

attributes and labels for each variable. Then, read and understand the regression

analysis conducted on this data presented at https://stats.oarc.ucla.edu/sas/

output/regression-analysis/

Make sure you understand the inputs from the Anova Table: Source, DF, etc.

b) (5 marks) Write 5-7 sentences describing the variables you will use in this ques-

tion. Use, e.g., PROC BOXPLOT or PROC SUMMARY. Present the output in an

Appendix correctly labelled and cited.

c) (5 marks) Propose and EXPLAIN a suitable regression model to look into your

client’s claims. Write down the full and reduced models, if applies.

NOTE: The model shown in a)-website is just an example and may be completely

different from the model that must be proposed and used in this question.

Continued...

STAT802 – Assignment 1, Part A. 4

d) (10 marks) Write down an executive summary. Using plain English, you are required

to make use of goodness of fit metrics. Include but not limited to the ‘F test’ (F-

value), the adjusted R-squared and the estimated coefficients. Present the output in

an Appendix correctly labelled and cited.

** END OF ASSIGNMENT 1A **

Would you like to increase your chances of successfully completing this assignment?

Read the following online documents:

a) A toy problem (interaction):

https://www.theanalysisfactor.com/interaction-dummy-variables-in-linear-regression/

b) Section 11.2 of (you will also need to read Section 11.1) https://book.stat420.org/

categorical-predictors-and-interactions.html

Edited by Victor Miranda; March 2023.


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp