联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-29 09:46

ANLY-511 Homework 7 Problems

Submit problems 73,75,78,80,84 89 and 90. Explain your work, give concise

reasoning, and . Attach R code with comments if applicable. Using Markdown

is the best way to do this. Do not print out any data or any detailed results of

simulations.

73. (2 points) The built-in data set ”chickwts” is a data frame with weight

gains in grams for 71 chicken who were each given one of six different treatments

(feed types). It may be loaded with the R command data(chickwts).

Load the dataset and make a single graph with side by side box plots of weight gains

for the six feed types. Which feed type appears to result in the largest weight gain?

The smallest weight gain? Are there feed types whose weight gains are possibly

the same? Do not do any statistical tests to answer these questions. Support your

answers with the graph.

74. (2 points) Import the data in the file GSS2006.csv (available in the zip

file with Chihara/Hesterberg data in Blackboard) as a data frame. Make suitable

two-way tables to explore the following questions. Do not carry out any statistical

tests.

a) Does it appear that the distribution of views towards the death penalty is the

same in all regions of the country?

b) Does it appear that the distribution of views towards the legalization of marijuana

is the same in all regions of the country?

c) Does it appear that views towards the death penalty and towards the legalization

of marijuana are related?

In the next three problems, distribution A is a standard normal distribution and

distribution B is a N(1, 2

2

) distribution. Generate 20 random numbers from distribution

A and 30 random numbers form distribution B and record these in a suitable

data frame. Use these data for all three problems. Be sure to fix a suitable random

seed before simulating your data.

75. (2 points) Examine the null hypothesis that the means of A and B are the

same against the alternative that the mean of B is larger, using a permutation test.

Report the p-value and state your conclusion.

76. (2 points) Examine the null hypothesis that the variances of A and B are

the same against the alternative that the variance of B is larger, using a permutation

test. Report the p-value and state your conclusion.

77. (2 points) Examine the null hypothesis that the 75th percentiles of A and

B are the same against the alternative that the 75th percentile of B is larger, using

a permutation test. Report the p-value and state your conclusion.

For the next three problems, assume that data are in a random sample of size n

from a normal N(µ, 1) distribution with unknown µ. The null hypothesis is always

H0 : µ = 0. The first step for each problem is to find the sampling distribution of

the sample mean x¯.

78. (2 points) Suppose the alternative is that µa > 0. You are going to use

the sample mean ¯x as the test statistics. You plan to conduct the test by rejecting

H0 if ¯x is sufficiently large, i.e. ¯x > x0 for some x0, and you have already decided

that you will reject always the null hypothesis if the p-value of the test is < 0.05.

1

2

Use R to compute x0 as a function of n for n = 5, 10, 20, 50, 100. The interval

[x0, ∞) is called the rejection region.

79. (2 points) Suppose the alternative is again that µa > 0. A different

approach consists in rejecting the null hypothesis if ¯x ≥ x0 for some predetermined

x0 > 0.

a) Suppose you do this with x0 = 0. What is the largest possible p-value for which

you would still reject?

b) Suppose x0 = .4 and n = 20. What is the largest possible p-value for which you

would still reject?

c) Suppose x0 < 0. Explain why in this case you might reject H0 even if the p-value

is larger than 0.5.

80. (2 points) As in the previous problem, you have decided that you will

reject the null hypothesis if ¯x ≥ x0. You have chosen x0 = .4 and the sample size

is n = 205. Suppose now that the alternative hypothesis is actually true and that

in fact µ = 0.5. You don’t know this, of course. Compute the probability that

you will reject the null hypothesis, i.e. that you make the correct decision, using

R . This probability is called the power of the test. It depends on µ, n, x0, among

other things.

81. (5 points) Distribution of p-values. In this exercise is you will gain

insight into the behavior of p-values if the null hypothesis is true. Consider data

coming from a certain N(µ, σ2

) distribution. The null hypothesis is that µ = 1, σ2 =

8. The alternative is that µ > 1, σ2 = 8. We use the sample mean ¯x of a random

sample of size n = 15 as test statistic.

a) Find the exact sampling distribution of ¯x, assuming the null hypothesis is true.

b) Since each observed ¯x results in a p-value, we can regard the p-value as a random

variable. And since the exact null distribution of ¯x is known from part a), one can

compute this p-value, using the cdf of this distribution. Use R to compute simulate

10,000 sample means ¯x and find all p-values. Make a histogram and plot the ecdf.

What is the distribution of the p-values? Can you explain this?

82. (2 points) Problem 3.9 #12abc in Chihara/Hesterberg.

83. (2 points) Problem 3.9 #14 in Chihara/Hesterberg.

84. (2 points) Problem 3.9 #25 in Chihara/Hesterberg. Import the dataset

Lottery.csv and conduct a test of the null hypothesis that the data in the file come

from a multinomial distribution on {1, . . . , 39} with all pi =

1

39 , using a suitable

buyilt-in procedure. Report the p-value and state your conclusion. This is similar

to the question whether birth dates of soccer players follow a uniform distribution.

85. (2 points) Problem 3.9 #30 in Chihara/Hesterberg.

86. (2 points) Consider the following pairs of attributes from the GSS2002

dataset. Associations between all these pairs could be examined with a χ

2

test.

Which of these would be questions about homogeneity of distributions across several

populations, which would be questions about independence of attributes? Explain

each answer in one sentence. Do not carry out any tests for this problem.

• Gender and education

• Race and education

• Happiness and political party

3

• Gender and views of death penalty

• Views of gun laws and race

87. (2 points) Import the GS2002 data set.

Use a χ

2

test to determine if the following attributes are independent. Explain to

yourself why the number of degrees of freedom is correct in each case.

• Gender and education

• Happiness and political party

88. (2 points) Consider the data frame Problem58 in the R workspace hw7.RData.

Explain why a χ

2

test should not be used to investigate the question whether the

variables X and Y are independent. Then use a permutation test to study this

question.

89. (5 points) Import the data set Titanic.csv which contains survival data

(0 = death, 1 = survival) and ages of 658 passengers of the Titanic which sank on

April 15, 1912. Examine the null hypothesis that the mean ages of survivors and

of victims are the same against the alternative that these mean ages are different,

using a permutation test. Compute the p-value and state your conclusion. This is

a two-sided test. How should the p-value be computed in this case?

90. (5 points) The dataset NCBirths2004.csv contains data from over 1000

births in the state of North Carolina. One of the columns contains the weight of

the newborn baby in grams. Another column tells you whether the mother was

a smoker (Yes or No). We want to determine whether the data contain evidence

that babies born to mothers who smoke weigh less on average than babies born to

non-smoking mothers.

Import the dataset, make side by side boxplots of birth weights for smoking and

non-smoking mothers, formulate suitable hypotheses, carry out a permutation test,

and state your conclusion.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp