MATH20811 Practical Statistics : Coursework 2

The marks awarded for this coursework constitute 30% of the total assessment

for the module. It is envisaged that it will take the average student

about 15 hours to complete.

The submission deadline is 10.00 am on Monday 2 December

2019.

Late Submission of Work: Any student’s work that is submitted after

the given deadline will be classed as late, unless an extension has already

been agreed via mitigating circumstances or a DASS extension. The following

rules for the application of penalties for late submission are quoted from

the University guidance on late submission document, version 1.3 (dated

July 2019).

”Any work submitted at any time within the first 24 hours following the

published submission deadline will receive a penalty of 10% of the maximum

amount of marks available. Any work submitted at any time between 24

hours and up to 48 hours late will receive a deduction of 20% of the marks

available, and so on, at the rate of an additional 10% of available marks deducted

per 24 hours, until the assignment is submitted or no marks remain.”

Your submitted solutions should all be in one document and be prepared

using LaTex.

For each of the questions you should provide explanations as to how

you completed what is required, show your working and also comment on

computational results, where applicable.

When you include a plot be sure to give it a title and label the axes

correctly.

When you have written or used R code to answer any of the parts, then

you should list this R code after the particular written answer to which it

applies. This may be the R code for a function you have written and/or

code you have used to produce numerical results and plots.

Using LaTex, you can include R code and output from R using the

verbatim environment. ie. type

\begin{verbatim}

copy and paste lines of text from R here

\end{verbatim}

1

Your file should be submitted through the module site on Blackboard

to the Turnitin assessment entitled ”MATH20811 CW2” by the above time

and date. The work will be marked anonymously on Blackboard so please

ensure that your filename is clear but that it does not contain your name

and student id number. Similarly, do not include your name and id number

in the document itself.

Turnitin will generate a similarity report for your submitted document

and indicate matches to other sources, including billions of internet documents

(both live and archived), a subscription repository of periodicals, journals

and publications, as well as submissions from other students . Please

ensure that the document you upload represents your own work and is written

in your own words. The Turnitin report will be available for you to see

shortly after the due date.

This coursework should hopefully help to reinforce some of the methodology

you have been studying, as well as the skills in R you have been

developing in the module so far.

2

1. The following table gives the numbers of road casualties in Greater

London during 2013, categorised as being either ”fatal”, ”serious” or

”slight” and grouped by five modes of transport.

Casualty Severity

Fatal Serious Slight Sum

Mode of Transport Pedestrian 65 773 4343 5181

Pedal Cycle 14 475 4134 4623

Powered 2 Wheeler 22 488 3992 4502

Car 25 310 9850 10185

Other Vehicles 6 146 2556 2708

Sum 132 2192 24875 27199

The question of interest is whether the five modes of transport differ in

their respective probabilities of different casualty severity. You should

regard the row sums as being fixed quantities here.

(i) Given the description of the data, write down a suitable probability

model for this matrix of counts.

[2 marks]

(ii) Read the data as a matrix into R and label the two dimensions

appropriately.

Calculate appropriate proportions and comment informally on

the question of interest given above.

[5 marks]

(iii) Present the proportions data graphically and comment on the

resulting plot.

[5 marks]

(iv) State the relevant statistical hypotheses and test them using a

significance level α = 0.05 and a critical value from the asymptotic

null distribution of your test statistic. (You should clearly

state what this distribution is.) State your conclusions.

[3 marks]

(v) Print out appropriate sets of residuals and comment on their

values in the light of the conclusions you made in part (iv).

[3 marks]

3

(vi) Write your own code in R to obtain B = 5000 values of the test

statistic, each one calculated using a set of data simulated under

the assumption that the null hypothesis is true. You should aim

to efficiently make use of for loops in doing this.

Produce a histogram of these simulated values, superimpose the

plot of the asymptotic null distribution and comment informally

on the goodness-of-fit.

[6 marks]

(vii) Construct approximate 95% confidence intervals for (a) the difference

between the probability that a pedestrian is seriously injured

and the probability that a car driver is seriously injured

and (b) for cyclists only, the difference between the probabilities

of a serious injury and a slight injury.

[6 marks]

[Total marks = 30]

4

版权所有：编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。