Homework Three
Karla Ballman
26 September 2018
DUE DATE
This assignment is due at noon on Wednesday, October 3, 2018.
Human papillomavirus (HPV) is the most common sexually transmitted infection in the United States. Some
HPV types can cause genital warts and are considered low risk, with a small chance for causing cancer. Other
types are considered high risk, causing cancer in different areas of the body including the cervix and vagina
in women, penis in men, and anus and oropharynx in both men and women. A report provides the most
recent national estimates of genital HPV prevalence among adults aged 18–59 from the National Health and
Nutrition Examination Survey (NHANES) 2013–2014. During 2013–2014, prevalence of any genital HPV was
42.5% among adults aged 18–59. This information applies to Questions 1-4.
Question 1: Sampling distribution for a sample proportion
1(a) If we use a sample proportion, P, based on a sample of size n = 20, to estimate the
population proportion, π = 0.425, would it be very surprising to get an estimate that is off
by more than 0.10 (that is, the sample proportion is less than 0.325 or greater than 0.525)?
Support your answer.
1(b) If we use a sample proportion, P, based on a sample of size n = 100, to estimate the
population proportion, π = 0.425, would it be very surprising to get an estimate that is off by
more than 0.10? Support your answer.
1(c) If we use a sample proportion, P, based on a sample of size n = 500, to estimate the
population proportion, π = 0.425, would it be very surprising to get an estimate that is off by
more than 0.10? Support your answer.
1(d) If we use a sample proportion, P, based on a sample of size n = 20, to estimate the
population proportion, π = 0.425, would it be very surprising to get an estimate that is off
by more than 0.05 (that is, the sample proportion is less than 0.375 or greater than 0.475)?
Support your answer.
1(e) If we use a sample proportion, P, based on a sample of size n = 100, to estimate the
population proportion, π = 0.425, would it be very surprising to get an estimate that is off by
more than 0.05? Support your answer.
1(f) If we use a sample proportion, P, based on a sample of size n = 500, to estimate the
population proportion, π = 0.425, would it be very surprising to get an estimate that is off by
more than 0.05? Support your answer.
1(g) Using parts information obtained in your answers above, comment on the effect that
sample size has on the accuracy of an estimate.
Question 2: Maximum likelihood
Suppose we sample of size of n = 222 random adult individuals age 18-59 and determine that 98 individuals
were infected with genital HPV.
1
2(a) Draw the liklihood function for an estimate of the population parameter as a function of
π (with sample size n = 222).
2(b) What is the maximum likelihood estimate of the population proportion based on the
observed data?
2(c) Suppose that the sample size is increased to n = 444 random adults between the age of 18-
59 and observe that 196 individuals were infected with gential HPV. How does the likelihood
function change from that in 2(a)? Explain and/or show.
2(d) What is the maximum likelihood estimate of the population proportion based on the
observed data in 2(c)? How does it compare to the estimate in 2(b)?
Question 3: Confidence interval for HPV
Suppose that you obtain a random sample of adult individuals age 18-59 and determine whether they were
infected with genital HPV (π = 0.425). Use the R function binom.test().
3(a) Use R to get a random sample of n = 20 from adults 18-59. Determine the width of the
exact 99% confidence interval for π for your sample. To get the width, you would subtract
the lower bound of the 99% confidence interval from the upper bound of the 99% confidence
interval.
3(b) Use R to get a random sample of n = 100 from adults 18-59. Determine the width of the
exact 99% confidence interval for π for your sample.
3(c) Use R to get a random sample of n = 500 from adults age 18-59. Determine the width of
the exact 99% confidence interval for π for your sample.
3(d) How do the widths of your intervals in 3(a) - 3(c) compare?
Question 4: Hypothesis test for HPV
Suppose that you obtain a random sample of n = 123 adult individuals age 18-59 and determine whether
they were infected with genital HPV. Use the R function binom.test().
4(a) Use R to get a random sample of n = 123 from a population of adults age 18-59 with
π = 0.405. Report the sample proportion and perform a test to determine whether it differs
from π = 0.425
4(b) Use R to get a random sample of n = 123 from a population of adults age 18-59 with
π = 0.445. Report the sample proportion and perform a test to determine whether it differs
from π = 0.425
4(c) Use R to get a random sample of n = 123 from a population of adults age 18-59 with
π = 0.385. Report the sample proportion and perform a test to determine whether it differs
from π = 0.425
4(d) Use R to get a random sample of n = 123 from a population of adults age 18-59 with
π = 0.465. Report the sample proportion and perform a test to determine whether it differs
from π = 0.425
4(e) Use R to get a random sample of n = 123 from a population of adults age 18-59 with
π = 0.365. Report the sample proportion and perform a test to determine whether it differs
from π = 0.425
2
4(f) Use R to get a random sample of n = 123 from a population of adults age 18-59 with
π = 0.485. Report the sample proportion and perform a test to determine whether it differs
from π = 0.425
4(g) Using parts information obtained in your answers above, comment on what you observe
about the relationship between the sample proporiton, the hypothesized π, and the p-value.
Question 5: Normal approximation to the binomial
1 out of 4 women that smoke will die from smoking related diseases.
5(a) What is the probability that out of 1,000 women smokers more than 300 will die from
smoking related diseases? Use a normal approximation to the binomial to estimate this quantity
and compare it to the exact probability from a binomial.
5(b) What is the probability that the death from smoking related disease will be between 238
and 245 cases in 1000 random women smokers? Again, compare the estimate obtained from a
normal distribution to that obtained from the binomial distribution.
5(c) What is the probability that out of 1,000 women smokers the sample proportion of women
who will die from smoking related diseases is greater than will be greater than 0.30? Use a
normal approximation to the binomial to estimate this quantity and compare it to the exact
probability from a binomial.
5(d) What is the probability that the sample proportion of death from smoking related disease
will be between 0.238 and 0.245 cases in 1000 random women smokers? Again, compare the
estimate obtained from a normal distribution to that obtained from the binomial distribution.
5(e) What is the probability that out of 24 women smokers less than 6 will die from smoking
related diseases? Use a normal approximation to the binomial to estimate this quantity and
compare it to the exact probability from a binomial.
5(f) What is the probability that out of 24 women smokers between 5 and 9 will die from smoking
related diseases? Use a normal approximation to the binomial to estimate this quantity
and compare it to the exact probability from a binomial.
5(g) Which normal approximations to the binomial above were good? Which normal approximations
to the binomal distribution above were poor? How do you explain this?
Question 6: Confidence interval data analysis
Colon cancer is the most common gastrointestinal malignancy and the second-leading cause of cancer death
in the United States. Approximately 80% of colon cancer patients present with resectable, localized disease,
and in these patients, nodal metastases have long been recognized as the most important factor predicting
long-term survival. Nodal involvement is an important determinant in the decision to administer adjuvant
chemotherapy, and with the demonstration over the last decade of highly effective systemic therapies for colon
cancer, it is essential to ensure that all patients who would benefit from such treatment receive counseling
concerning these therapies and have access to them. Numerous studies have shown an improvement in
disease-specific and overall survival when increasing numbers of lymph nodes are examined for colon cancer.
There has been a considerable effort to determine the minimum number of nodes that need to be evaluated
to deem a patient free of nodal metastases with reasonable certainty. Estimates have varied from 6 to 40
lymph nodes; however, numerous studies and consensus guidelines have suggested that examination of 12
regional lymph nodes is a reasonable minimum for adequate nodal evaluation for colon cancer. Despite these
findings, population-based assessments have shown that the majority of patients in the United States do not
have 12 or more nodes examined.
3
The American College of Surgeons (ACoS), American Society of Clinical Oncology (ASCO), and the National
Comprehensive Cancer Network (NCCN) harmonized a quality measure requiring resection and pathological
examination of 12 or more lymph nodes for colon cancer. Subsequently, the National Quality Forum
(NQF) endorsed the 12-node measure for quality surveillance. A large, national health-care insurer, United
Healthcare, is already basing referral recommendations for colectomy on a one-time requirement that surgeons
provide pathology reports demonstrating examination of 12 or more lymph nodes for colon cancer. These
organizations deem a hospital to be compliant with the 12-node measure if examination of at least 12 nodes
occurred for at least 75% of patients at that hospital. Due to statistical variation, hospitals were considered
“statistically compliant” if the upper limit of the (two-sided) 95% confidence interval (CI) of the estimate of
their performance rate was greater than or equal to 75%.
Dr. Smith has requested that you analyze the data provided to determine whether the hospital
currently meets this cancer care quality metric for node examination in colorectal cancer
patients. Write a short paragraph that summarizes your findings and answers Dr. Smith’s request.
DATA The data are in a file called crc.csv
BMI: an indicator whether a person has a normal BMI (BMI less than or equal to 25) or an overweight
BMI (BMI greater than 25)
Nodes: the number of lymph nodes examined
Year: year in which the patient underwent her/his colorectal cancer surgery
Question 7: Hypothesis test data analysis
Hopevale Hospital wants to understand the quality of care for their breast cancer patients. In particular,
the quality measurement in which they are most interested is the reoperation rate of women who undergo
lumpectomy. The leadership is interested in how their lumpectomy reoperation rates in women who are newly
diagnosed with breast cancer compares to their peer institutions. They know that the average reoperation
rate for a set of their peer institutions is 15%. They would like you to look at the data they provide on 700
procedures and tell them whether they meet or exceed this.
Data The data are in a file called BreastCancerOutcome.csv
size: the size of the cancer in cm
reoperation: indicates whether the patient had to undergo a reoperation
7(a) Is there evidence that the reoperation rate at Hopevale Hospital differs from that of their
peer institutions? Use a significance level of 0.05. Perform an hypothesis test. Write your answer in
a short paragraph that includes the p-value and an indication of how the rate differs (if it is statistically
significant).
7(b) The hospital also collected data on the size of the tumor. It is known that reoperation
rates likely differs for different size tumors. Is there evidence that the reoperation rates for
the different size tumors differs from that of Hopevale Hospital peer institutions? Use a
significance level of 0.05. Perform two hypothesis tests, one for each tumor size category. Summarize
your findings in a paragraph.
7(c) What is something that would be important to know regarding the tumor size at the peer
hospitals?
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。