联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2022-10-30 09:59

MATH5945: Categorical Data Analysis

Term 3, 2022

Assignment 2

Submission deadline: Friday 28 October, 12:00pm

Deliverables: 2 files uploaded to Moodle: (1) PDF file of your worked solutions, and (2)

SAS file forALL computations. Files names should be surname firstname z123456789 ASS2.

Assignment length: There is a 5 page limit and minimum 12pt font size. Any pages

exceeding this limit or submissions with smaller font sizes will not be marked. Handwritten

assignments will not be accepted. This does not include a SAS file of your code. Your

document should begin with the Plagiarism Statement below (copy-and-paste it).

SAS code: All computations must be performed using SAS. Your SAS code must run as

is and I should not need to modify your code in any way to make it work. You may create

a library to import data, but any other code should only use the WORK library (you may

assume data files of the same name are in my WORK library). SAS should be used for

computing only and answers given only within SAS code will not be marked.

Penalties: Failure to adhere to instructions will result in a minimum 5% mark reduction.

Name: Student Number:

I declare that this assessment item is my own work, except where acknowledged,

and has not been submitted for academic credit elsewhere, and acknowledge that

the assessor of this item may, for the purpose of assessing this item:

Reproduce this assessment item and provide a copy to another member of the

University; and/or,

Communicate a copy of this assessment item to a plagiarism checking service

(which may then retain a copy of the assessment item on its database for the

purpose of future plagiarism checking).

I certify that I have read and understood the University Rules in respect of Student

Academic Misconduct.

Signed: Date:

1

1. Consider the observed 2× 2× 2 table created from the binary variables X, Y and Z.

In this case, we are interested in assessing the relationship of X on to Y , while the

variable Z may interact or confound this relationship.

z = 0

y = 0 y = 1

x = 0 a0 b0

x = 1 c0 d0

z = 1

y = 0 y = 1

x = 0 a1 b1

x = 1 c1 d1

The stratified tables have conditional odds ratios ψ?0 = a0d0/(b0c0) and ψ?1 = a1d1/(b1c1).

(a) Using this setup, show that the square root of Woolf’s test for interaction statistic

X 2W =

1∑

i=0

wi

(

log ψ?i ? log ψ?W

)2

can be written as the difference in two independent log odds ratios divided by

its standard error, where log ψ?W is Woolf’s summary log odds ratio and wi =

(1/ai + 1/bi + 1/ci + 1/di)

?1.

(b) Using the values in the above table, now consider the saturated logistic model for

πij = P(Yij = 1)

log

πij

1? πij = β0 + β1xi + β2zj + β3xizj

i. Write out the log-likelihood function for β = (β0, β1, β2, β3)

ii. Find the score function ? logL(β)

, the partial derivatives of the log-likelihood

function.

iii. Find the maximum likelihood estimator for eβ3 , the exponentiated parameter

for the interaction term, by solving the system of equations

? logL(β?)

?β?

= 0.

(c) Using your results from (a), (b) and this element of the inverse of the observed

Fisher’s information matrix for the logistic model

J

,

demonstrate that Woolf’s test for interaction is equivalent to inference for the

interaction term in a saturated logistic regression model for a 2× 2× 2 table.

2

2. The SAS datafile injury contains data from motor vehicle passengers injured in a

crash. The dataset contains the variables:

Name Values

sex female, male

location rural, urban

seatbelt no, yes

injury no, yes

freq 0, 1, 2, . . .

We would like to fit a log-linear model to the four-way contingency table created

from these variables. For ease in interpretation, denote these variables by S, L, B,

I, respectively, in model shorthand and use numbers 1, 2 to identify the levels of a

variable. For example, τS1 represents female sex.

Make this data available in SAS by creating a libname for its location on your computer

and copy this file to your WORK folder using the same filename, i.e., injury.

(a) Check the goodness of fit of the following hierarchical models:

(M1) main effects only

(M2) all two-way interaction terms

(M3) all three-way interaction terms, and

(M4) all four-way interaction terms (saturated)

What is the lowest order model that reasonably fits the data? Give reasons.

(b) Based on the model chosen in part (a), perform forward selection using partitioned

G2 statistics to choose a “best model”. Justify your steps.

(c) Answer these questions regarding the model chosen in part (b).

i. Write out the log-linear model and its logit equivalent using Injury (I) as

the response variable in symbolic form (i.e., τ notation).

ii. Using the symbolic log-linear model, what is the odds ratio ψ(I) of injury for

an individual who wore a seatbelt compared to someone who did not?

iii. What is the estimate of ψ(I) and its 95% confidence interval using the es-

timated model? Be sure to provide strata-level estimates if the final model

includes interaction term(s).


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp