联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-08-31 11:00

STATS 201/8 Data Analysis

Assignment 2, Second Semester, 2019

Due: 3pm Thursday 29th August

Instructions concerning this assignment:

We are providing you an R Markdown document called STATS20x_2019_S2_A2.Rmd

(available on Canvas) which will have some answers already filled in. You will need to fill in

and complete the rest of the document. The data files you will be using for the assignment are

described in the questions and are available from Canvas. Make sure you put these datasets in

the same place you put the R markdown document because it is going to look for them there.

The first change you need to make to the markdown document is put your name and ID number

at the top.

Notes:

This assignment is worth 7% of your final mark and requires a substantial amount of work. Do

not leave it until the last few days.

Late assignments are not accepted unless there is a good reason for an extension being granted

(usually medical requiring a medical certificate).

The total marks for this assignment will be 55 (this includes 6 marks for presentation and

communication) which will be converted to a mark out of 10 for recording. Most of the marks

for assignments will tend to be for interpretation.

There are 6 Presentation and Communication marks for this assignment as follows:

Coversheet. Using and filling in the correct coversheet.

Name and ID number at top of R Markdown document.

Space saving and printing assignment 2-up. Not printing out unnecessary output (listing

data sets or showing erroneous R output). Assignment work printed out in "2-up" layout. 2-up

layout prints 2 pages side-by-side reduced to one page.

Readability. This is for your general communication ability in the assignment. This includes

sentences clearly conveying the correct idea; sentences making sense; comments not being

excessively long or short; conclusions following logically from previous statements.

Use of Natural Language in Executive Summaries. In executive summaries, this is for

discussing the analysis in context, not using variable names, using units when known and

rounding sensibly.

Keeping to the Point in Executive Summaries. In executive summaries this is for not going

into far more detail than required.

It is your responsibility to back up your computer files. If you are using your own computer, it is

your responsibility to ensure that you can access the data and run R and R Studio well ahead of

the assignment due date. Technical problems outside our control are not accepted as excuses

for submitting coursework late.

Question 1. [16 Marks]

Researchers were interested in whether male and female students have the same level of

cholesterol intake. A study was conducted in schools from Michigan. A random sample of

students were surveyed and their cholesterol intake per day was estimated from a standard

food frequency questionnaire. The researchers want to compare both the mean and median

cholesterol intake for male and female students.

The dataset is stored in chol.txt and includes variables:

chol cholesterol (mg) consumed per day by a student .

sex gender of the student (F = Female, M=Male)

Look at the plots and summary statistics of the data and comment on them.

Fit a model to the data to compare the means. Check the model assumptions.

Fit a model to the data to compare the medians. Check the model assumptions.

Notes: Use linear models above. DO NOT use the Welch tests.

Write a Methods and Assumption Checks section.

Note: this will be a slightly different Methods and Assumption Checks section than usual as

you will be effectively doing everything twice as you are fitting two different models to

your data.

Write an Executive Summary. (See Assignment 1 notes for tips on writing this section.)

NOTE: When writing Executive Summaries, remember the Questions of Interest/Goals.

Question 2. [12 Marks]

A jeweller prices diamonds based on quality and colour. It is believed that the typical price of

a diamond can be modelled as:

Price = α × Colourβ

A sample of 25 diamonds weighing between 1.0 and 1.5 carats is examined to test this

relationship. The jeweller wants to know if this power relationship holds. In particular, she

wants to estimate how much 50% increase in colour score affects the price of the diamonds.

The dataset is stored in Diamonds.csv and includes variables:

Price the price per carat (in hundreds of dollars)

Colour colour score of the diamonds with values on a scale from 1 to 10 (1 being

yellow and 10 being pure white – so higher is better)

Look at the initial plot of the data and comment on it.

The hypothesized model for this data is a power model so fit a power model to this data,

with ALL variables logged. Check the model assumptions.

Generate inference output required from the final model.

Write a Methods and Assumption Checks section.

Write an Executive Summary.

Question 3. [21 Marks]

Dementia is a group of neurodegenerative disorders characterized by memory impairment and

cognitive decline. Depending on the underlying mechanisms, dementia can be further

categorized into Alzheimer's disease (AD), Lewy body dementia, frontotemporal dementia, and

vascular dementia. AD accounts for 60% to 80% of dementia case, and its incidence increases as

people ages. Traditionally, the diagnosis of AD was mainly based on the clinical manifestation

of symptoms. However, it is believed that the pathophysiology of AD starts years ahead of the

manifestation of clinical symptoms and most available treatments can only slow the progression

of the disease. Therefore, it requires new tools to detect AD earlier than conventional methods.

18F-FDG-PET, (that has been used for tumour imaging), is a promising neuroimaging tool in the

diagnosis of early AD as it reflects resting state cerebral metabolic rates of glucose, an indicator

of neuronal activity. In a case-control study both patients with AD and healthy individuals were

scanned and a FDG score calculated as well as their age recorded. Researchers are interested in

whether scanning results are different between normal individuals and AD patients. If there is a

difference, the researchers are also interested in whether this difference depends on age.

The data used in this question is an independent sub-sample from the Alzheimer’s disease

Neuroimaging Initiative. The data is stored in "ADNI.txt" and contains the variables

FDG 18F-FDG-PET scanning result of the subject (numeric).

Age Age of the subject (years).

Status AD status of the subject (either AD for Alzheimer's disease or CN for healthy

individuals – the control group).

In the R-markdown file, most of the analysis has been done for you as we want you to answer

some specific questions.

Look at the initial plot of the data and comment on it.

Lookattheplotofthemodelwiththefittedlinessuperimposed.Thenlookbackatthe

plotoftheResidualsversusFittedValuesplot.Explainwhytherearetwoclustersof

pointsintheResidualsversusFittedValuesplot.

Write a Methods and Assumption Checks section.

In terms of slopes and/or intercepts, explain what the coefficient of Age:StatusCN is

estimating.

For each of the following, either write a sentence interpreting a confidence interval to

estimate the requested information or state why we cannot answer this from the R-output

given:

- in general, the difference in size of FDG scores between healthy people and those with

Alzheimer's disease.

- the effect on the FDG scores of healthy people for each additional year of age.

- the effect on the FDG scores of people with Alzheimer's disease for each additional

year of age.

Looking at the plot with the model superimposed, describe what seems to be happening in

2-3 sentences.

Look at the final plot that shows prediction intervals for FDG score plotted against age for

both the AD and CN group. A goal of the study is to be able to look at an FDG score and

predict Alzheimer's diseaseearlier.Basedonthisplot,discusswhetherthisseems

plausible.Justifyyouranswer.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp