Mid-term Project of 6289 (Due: Oct 29, 2019) Name:
• You need to submit your answers before 2:00pm of Oct 29.
• You may talk with one another about the project, but the work you turn in should be
your own.
• Use a word processor that can handle mathematics (like LATEX or word) and can
include graphics. No handwriting is accepted.
• Finalize your codes (with file name: yourname-6289.*) and email me a copy
for verifying
1. (80) Duchenne Muscular Dystrophy (DMD) is a sex-linked genetic disease. Boys with
the disease usually die at a young age, while affected girls usually do not suffer symptoms
and may unknowingly carry the disease and pass it to their offspring. It is
desirable to have some kind of test to detect whether or not a woman is a carrier of
the disease. The dataset dystrophy.txt contains information from a 1981 study attempting
to develop such a test based on two serum enzymes, creatine kinase (CK) and
hemopexin (H) for 38 known DMD carriers (Case) and 82 women who are not carriers
(Control). (Note: In the last 30 years, advances in DNA sequencing technology has
made it possible obtain definitive answers; however, tests based on the above proteins
are still used as rapid and inexpensive alternatives).
(a) Use logistic regression to model the way in which case/control status depends
on creatine kinase and hemopexin. Construct (using the Wald approach) a table
containing the estimated odds ratios and p-values for the two enzymes. Provide
confidence intervals for the odds ratios, and give some thought as to what would
constitute a meaningful difference (δj ) for the two enzymes when calculating the
odds ratios.
(b) Can you calculate confidence intervals for the odds ratios in part (a) using the
likelihood ratio approach? If so, calculate them. If not, explain why you can’t do
so.
(c) Can you carry out the hypothesis testing in part (a) using the likelihood ratio
approach? If so, perform the tests. If not, explain why you can’t do so.
(d) Describe (quantitatively) the relationship between creatine kinase levels and the
likelihood that a woman is a carrier without using the phrase “odds ratio” (you
can use “odds”, just not “odds ratio”).
1
(e) Suppose a woman randomly selected from the population has a hemopexin level
of 100 and a creatine kinase level of 150. Can you estimate the probability that
she is a carrier? If so, estimate it. If not, explain why you can’t do so.
(f) It is estimated that 1 in 3,300 women are carriers. Treating this as a known
constant, calculate the sampling ratio τ1/τ0.
(g) Based on your answer to (f), calculate the probability from part (e).
(h) Compare1 the following three numbers: (i) the probability you calculated in (g),
and (ii) the marginal probability of being a carrier (i.e., if you don’t know a
woman’s hemopexin/creatine kinase levels).
2. (20) Consider a binary response variable Y and logistic regression. We focus on the
group Lasso with loss function given by the negative log-likelihood as (see equation
(3.3) in the HDDA book as well)
Write the block coordinate gradient descent algorithm (Algorithm 3 in the HDDA
book) with explicit formulae (see equations (4.20) and (4.21) in the HDDA book,
where 0 < δ < 1, 0 < σ < 1, and ∆[m]
is the improvement in the objective function
Qλ(·) when using a linear approximation for the objective function, i.e.,
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。