联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> CS作业CS作业

日期:2018-04-25 03:31


Instructions

• Assignments are to be placed in the appropriate subject and lab box located just inside the north

entrance to the Peter Hall Building. Assignments must be stapled.

• Please label your assignment with the following information:

– your name;

– your student number;

– your lab class;

– your tutor’s name (Sandy, Steve or David).

• You must sign the plagiarism ideclaration. The link is available on the LMS.

• Your assignment should show all working and reasoning, as marks will be given for method as well as

for correct answers. Please spell check your document.

• Paste any R code and output into the appropriate places so that it can be seen easily along with your

other work. Graphics from R can be resized within your document; make them smaller as necessary.

• Assignments count for 50% of the assessment in this subject. This one is worth 15%, and covers the

work done in weeks 1 to 3.

• Tutors will not help you directly with assignment questions. However, they may give some help with

R.

• Solutions to the assignment questions will be made available later.

• When constructing a panel of graphs with multiple plots, it is good to use the R command

par(mfrow = c(nrows,ncols)) where nrows is the number of rows and ncols the number of columns

in the panel. The default is (1,1).

MAST90044 Thinking and Reasoning with Data Assignment 1

Q.1. The data set unesco.csv, available on the LMS, contains demographic and economic information from

the 1990 UNESCO yearbook on about half the world’s countries. Definitions of the variables in the

data set are as follows:

• Birth rate per 1,000 of population

• Death rate per 1,000 of population

• Infant deaths per 1,000 of population

• Life expectancy at birth for males

• Life expectancy at birth for females

• Gross National Product (GNP) per capita

• Geopolitical group

1 Eastern Europe (former Soviet Satellite)

2 South America and Mexico

3 Western Europe, North America, Japan

4 Middle East

5 Asia

6 Africa

• Country

Ignoring geopolitical group:

(a) Summarise the GNP values using summary statistics and two graphical tools. Briefly describe any

obvious features of the distribution.

(b) Use two graphical tools to compare the observed distribution of infant deaths with a normal

distribution. Briefly comment.

(c) Graphically examine the relationship between the infant death rate and GNP. Calculate the correlation

coefficient between the two variables. Comment on how useful it is in this situation.

(d) Graphically examine the relationship between life expectancy at birth for females and the birth

rate. Comment on the strength or otherwise of the relationship. Formulate a statistical model

to describe the relationship. Graphically fit the model, and use it to roughly estimate one of the

parameters in the model (excluding σ).

Taking geopolitical group into account:

(e) Use two graphical tools to examine the relationship between life expectancy at birth for males and

geopolitical group. Use suitable R functions to calculate the mean and standard deviation for each

group, and the number of countries in each group. Comment on any obvious differences between

the groups and identify any clear outliers.

(f) Write a statistical model to describe the relationship between life expectancy at birth for males

and geopolitical group. Estimate one of the parameters in the model using the results in (e).

(g) Calculate the net population growth rate per 1000 of population (we will call this “net growth”).

Type library(lattice) in R to ensure that the xyplot() function is available. Use xyplot

to examine the relationship between net growth and GNP for each geopolitical group separately.

Note that in the matrix of plots, group 1 will be placed in the bottom left hand corner, and you

proceed across the row of plots. Comment on what the plots show in regard to the relationship,

and any limitations of this type of plot here.

(h) Create a plot of net growth vs GNP for group 2 on its own. Calculate the correlation coefficient,

and comment on the strength and direction of the relationship.

2

MAST90044 Thinking and Reasoning with Data Assignment 1

Q.2. The data in count10.csv [2, 3, 3, . . . , 0] were obtained as counts of the number of items in batches

of ten, which had a particular characteristic.

(a) Describe the data (including appropriate descriptive statistics and plots).

(b) Show that for any binomial distribution, var(X) 6 E(X).

(c) A binomial distribution would be appropriate for such data if the items were independent and each

was equally likely to have the characteristic. Explain why these data are apparently incompatible

with the binomial distribution.

(d) The following proposals have been put forward to explain the failure of the binomial distribution

to describe these data.

i. The batches are from different sources.

ii. The proportion with the characteristic changes over time.

Discuss briefly (a sentence or two at most) each proposal, indicating whether it could result in

data like those obtained; and how it might be checked.

Q.3. The chi-squared distribution, denoted by X ∼ χ

2

ν

, is used a great deal in statistics and science, and

we will meet it again later. The exact shape of the distribution depends on the degrees of freedom (ν)

and smaller values of ν result in greater skewness, and therefore stronger departure from the normal

distribution. Here we will examine how quickly the sampling distribution of the sample mean taken

from a X ∼ χ

2

2 distribution converges to normality (or at least to symmetry).

(a) Take a large sample from the X ∼ χ

2

2 distribution and test its departure from normality using two

graphical tools. You will need the R function rchisq. Comment on the result.

(b) Examine the sampling distribution of the sample mean from samples of size 5, by generating 1000

such samples and looking at a plot of the density (make a comment).

(c) Compare the sampling distribution of the sample mean for a range of sample sizes (e.g. 1, 5, 10,

20, 40, 80), and use your results to suggest how large the sample size needs to be for adequate

convergence. The mean of a X ∼ χ

2

ν distribution is ν.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp