联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> OS作业OS作业

日期:2018-04-27 02:11

• INSTRUCTIONS:

– This assignment is worth 15% of your overall marks for this course (for all students, enrolled in

STAT2008, STAT4038 or STAT6038).

– If you wish, you may work together with another student (one other) in doing the analyses and

present a single (joint) report. If you choose to do this then both of you will be awarded the same

total mark. Students enrolled under different course codes may work together. You may NOT

work in groups of more than two students and the usual ANU examination rules

on plagiarism still apply with respect to people not in your group. This means you

should not discuss the assignment (questions, solutions, code, etc.) with your classmates or any

other individuals if they are not in your group. You can discuss the assignment with me (Anton

Westveld) or your tutors.

– Please submit your assignment on Wattle. As a group you should only submit one assignment.

Make sure to place to place the names and IDs of the individuals in your group on

the front page of your assignment. When uploading to Wattle you will submit:

1. Your assignment/report.

2. An ‘.R’ file containing the R code you used for the assignment.

– Assignments should be typed. Your assignment may include some carefully edited computer

output (e.g. graphs, tables) showing the results of your data analysis and a discussion of those

results, as well as some carefully selected code. Please be selective about what you present and

only include as many pages and as much computer output as necessary to justify your solution.

It is important to be be concise in your discussion of the results. Clearly label each part of your

report with the part of the question that it refers to.

– Unless otherwise advised, use a significance level of 5%.

– Marks may be deducted if these instructions are not strictly adhered to, and marks will certainly

be deducted if the total report is of an unreasonable length, i.e. more than 10 pages including

graphs and tables. You may include an appendix that is in addition to the above page limits;

however the appendix will generally not be marked, only checked if there is some question about

what you have actually done.

– Assignments will be marked by your tutor (or one of your two tutors, for joint assignments). You

may ask any of the tutors or me (Anton Westveld) questions about this assignment up to 5 pm

on Wednesday 28 March 2018.

– Late assignments will NOT be accepted after the deadline without an extension. Extensions will

usually be granted on medical or compassionate grounds on production of appropriate evidence,

but must have my permission by no later than 12 noon on Wednesday 28 March 2017. Even with

an extension, all assignments must be submitted reasonably close to the original deadline to allow

time for the marking to be completed.

1

DRAFT

1. (100 points) You will explore some of the techniques you have learned thus far by examining data

on housing prices in the Seattle area in 2015. The data have been placed on Wattle. While

there are number of variables available, for this assignment you will only consider the following:

– id: an id number for the house. Note: some house have been sold more than once.

– price: the price that the house was sold at in USD

– bedrooms: the number of bedrooms in the house

– bathrooms: the number of bathrooms in the house

– sqft.living: square footage of total living space

(a) (15 points) Conduct an exploratory data analysis, where price is the response (y) and the

variables which may affect price are: bedrooms, bathrooms, and sqft.living. In doing your

analysis make sure to identify any unusual points and discuss why they are unusual.

(b) (10 points) Is there a statistically significant correlation between price and sqft.living? Use

the cor.test() function to conduct a suitable hypothesis test. Clearly specify the hypotheses

you are testing and present and interpret the results.

(c) (10 points) Experiment with applying natural log transformations (to the base e, which is the

default for the log() function in R) and square root transformations to one or both of price

and sqft.living, and repeat the analysis in parts (a) and (b). Do NOT show all of your results,

just pick whichever one you think is the best choice of scale for the two variables and show

and discuss the results for your chosen combination.

(d) (20 points) Fit a simple linear regression (SLR) model with your chosen transformation of

price as the response variable and your chosen transformation of sqft.living as the predictor.

Construct a plot of the residuals against the fitted values, a normal Q-Q plot of the residuals,

a bar plot of the leverages for each observation and a bar plot of Cook’s distances for each

observation. Use these plots (and other means) to comment on the model assumptions and

on any unusual data points.

(e) (10 points) Produce the ANOVA (Analysis of Variance) table for the SLR model in part (d)

and interpret the results of the F test. What is the coefficient of determination for this model

and how should you interpret this summary measure?

(f) (15 points) What are the estimated coefficients of the SLR model in part (d) and the standard

errors associated with these coefficients? Interpret the values of these estimated coefficients

and perform t-tests to test whether or not these coefficients differ significantly from zero.

What do you conclude as a result of these t-tests?

(g) (10 points) Consider two other simple linear regressions. One where x =bedrooms and one

where x =bathrooms. Use the same transformation for the response as you did in part (d) [if

you decided to use one]. Interpret these two models. How do these models compare to the

one in part (d)?

(h) (10 points) Construct the following covariate in R which examines the number of bathrooms

and bedrooms per square foot of living space:

x

i =

sqf t.livingi

bedroomsi + bathroomsi + 1

Fit a SLR using x

. Use the same transformation for the response as you did in part (d) [if

you decided to use one]. Interpret the model. How do this model compare to the one in parts

(d) and (g)?


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp