联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-05-26 11:21

STA 141A, Homework 4

Due May 28th 2019 (by 8 am)


1 Sampling Schemes

2 Comparative performance of bootstrap procedures

Please submit on Canvas, in a compiled R-markdown file (to pdf or html).


All code in this assignment should be cleanly written and well commented, with appropriate use of functions/arguments. Imagine you are sending this code to your colleagues or supervisors for review—which they can only do if they can understand it. ***


1 Sampling Schemes

This problem is designed to emphasize on the effect of sampling strategies on the actual performance of estimators of population parameters.


Use the following code to generate a population of 20000 realizations of a variable X, stratified into two sub-populations of 15000 and 5000 measurements, respectively. Provide brief numerical and graphical summaries of the data for each subpopulation. (Notice that the data has the information about which subpopulation each subject belongs to.)

   ## Data generation

N1 = 15000 # population size for straum 1

N2 = 5000  # population size for stratum 2

N = N1+N2

# data generation

set.seed(1000)

mydata = matrix(0,N,2)

mydata[1:N1,1] = 1  # stratum 1

mydata[(N1+1):N,1] = 2 # stratum 2

mydata[1:N1,2] = rgamma(N1,shape=3,scale=2) # data for stratum 1

mydata[(N1+1):N,2] = rgamma(N2,shape=5,scale=5) # data for stratum 2

Write an R function that takes the data set and the sample size n as input, and returns the sample mean and sample standard deviation of a sample of size n drawn from the entire population without replacement. That is, the sampling scheme is SRSWOR (Simple Random Sampling Without Replacement).


Write an R function that takes the the data set sample sizes n1 and n2 (corresponding to two sub-populations) as input, an returns the sample mean and sample standard deviation of a sample of size n drawn from the entire population, where the samples are drawn by Stratified Sampling. This means, ni samples are drawn without replacement from the i-th subpopulation, i=1,2.


Compare the performance of the two sampling schemes (SRSWOR and Stratified Sampling), in terms of estimating the population mean and population standard deviation. For the stratified sampling, choose ni=n(Ni/N), where Ni is the population size for the i-th subpopulation, and n is the total sample size. For each value of n=100,400,1000, report the mean and variance of the estimators for each population parameter, computed by drawing 10000 samples of both types.


Based on your results in (4), write a very brief summary comparing the accuracies of SRSWOR versus Stratified Sampling in terms of estimating the parameters.


2 Comparative performance of bootstrap procedures

Generate a random sample of size n=100 following the univariate regression model

Yi=?5+2Xi+εi

where Xi’s are independent Chi-square variables with degrees of freedom 6, and εi’s are i.i.d. N(0,σ2) with σ=1. Fix a random seed to ensure that the results are reproducible.


Fit the least squares regression line to the data and obtain the estimate of (β0,β1,σ2).


Obtain 95% confidence intervals for β0 and β1 by using the nonparametric bootstrap procedure with 500 bootstrap replicates. Write an R function that accepts the data, the number of bootstrap replicates (default value = 100), and the confidence level (default value = 0.95) as input, and returns the two confidence intervals as output. (Note: Roughly speaking, a nonparametric bootstrap refers to the procedure where one i) draws bootstrap samples from the original samples with replacement, and ii) fits the regression model on the bootstrap samples.)


Obtain 95% confidence intervals for β0 and β1 by using the residual-based bootstrap procedure with 500 bootstrap replicates. Write an R function that accepts the data, the number of bootstrap replicates (default value = 100), and the confidence level (default value = 0.95) as input, and returns the two confidence intervals as output. (Note: The residual-based bootstrap only bootstrap the residuals from a fitted model. You can read more about the residual-based bootstrap here.)


How do the confidence intervals in (3) and (4) compare with the theoretical confidence intervals for β0 and β1? To compare the accuracy of the confidence intervals, repeat the procedure in steps (1)–(4) 200 times (using different random seeds for data generation for each simulation run) and report the following (for each type of confidence interval), and compare the results to that of the theoretical confidence intervals.


Length of the confidence intervals

Coverage probability, that is,the fraction of simulation runs in which the true parameter falls within the confidence intervals


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp