联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2023-03-04 01:29


STAT4620/5620 WINTER 2023

Assignment 3: Due Thursday March 2 2023

1. Suppose that you are interested in studying intravenous drug use among high

school students in Canada. Drug use is characterized as a binary random variable,

where 1 indicates that an individual has injected drugs within the past year and

0 that he/she has not. Covariate information related to drug use includes: infor-

mation about drug use provided in school (y/n), age of student (years), employed

part-time (y/n), school connectedness (Likert scale), and gender (m/f).

(a) [4pxs] Propose and defend a suitable model for the aforementioned data. Be

sure to write down the model equation.

(b) [3pxs] Discuss any potential interactions that might be worthwhile including in

your model and provide justification as to why (or why not).

(c) [1pxs] Which R package(s) would you use to fit the above model?

(d) [3pxs] What tools would you use to assess model fit and proceed with variable

selection?

2. [13pxs] Install the R Package faraway. Consider the esdcomp data that were recorded

on 44 doctors working in an emergency service at a hospital to study the factors

affecting the number of complaints received. Build a model for the number of

complaints received, justify your choices, and report your conclusions. (250 words).

3. [13pxs] The bootstrap is a general tool for assessing uncertainty. Describe the boot-

strap in general and then use it to investigate a statistic of relevance to the dataset

you have selected for your project. Take advantage of the functions available in the

R Package bootstrap and be sure to include your references. (500 words).

4. [7pxs] Cross validation is probably the simplest and most widely used method for

estimating prediction error. Ideally if we had enough data, we would set aside a

validation set and use it to assess the performance of our model. Since data are

sometimes scarce, this may not always be possible. We finesse this problem by

using K-fold cross-validation. Explain. (150 words).

5. For the analysis of count (or semicontinuous) data there are models available to

deal with the common situation where there is an excessive number of zeros.

(a) [7pxs] Discuss the various potential sources of zeros. (150 words).

(b) [11pxs] Describe mixture and two-part models and show how their formulations

handle different types of zeros. (250 words).

GUIDELINES FOR SUBMISSION:

Submit the R markdown file (.RMD), the .csv file containing your datasets, AND the result-

ing knitted .PDF file to BrightSpace Assignments under Assignment 3.


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp