联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-09-13 11:11

Assignment 2, Question 3 MAST90125: Bayesian

Statistical Learning

Due: Friday 20 September 2019

There are places in this assignment where R code will be required. Therefore set the random

seed so assignment is reproducible.

set.seed(123456) #Please change random seed to your student id number.

Question Three (18 marks)

A group of 453 Bangladeshi women in 5 districts were asked about contraceptive use. The response variable

use is an indicator for contraceptive use (coded N for no and Y for yes). Other covariates of interest are

categorical variables for geographical location district (5 levels), and urban (2 levels), and number of living

children livch (4 levels), and the continuous covariate for standardised age age.A random intercept for the

district was suggested. This suggested the following model should be fitted,

θ = Zu + Xβ,

where θ is a link function, Z is an indicator variable for district, u is a random intercept with prior

p(u) = N (0, σ2u

I), and X is a design matrix for fixed effects β, where β includes the coefficients for the

intercept, urban status, living children, and age.

Data can be downloaded from LMS as Contraceptionsubset.csv.

a) Fit a generalised linear mixed model assuming a logistic link using Stan. The R and stan code below

covers the following steps.

• Importing the data.

• Constructing design matrices.

• Provides code to go into the stan file.

• Running stan in R. This assumes your stan file is called *logitmm.stan*, and that you will run the

sampler for 2000 iterations and 4 chains.

Note that provided code assumes everything required is located in your working directory in R.

#Step one: Importing data, constructing design matrices and calculating matrix dimensions.

dataX= read.csv("Contraceptionsubset.csv",header=TRUE)

n<-dim(dataX)[1]

Z = table(1:n,dataX$district) #incidence matrix for district

Q = dim(Z)[2]

D1 = table(1:n,dataX$livch) #Dummy indicator for living children

D2 = table(1:n,dataX$urban) #Dummy indicator for urban status

#fixed effect design matrix

X = cbind(rep(1,n),dataX$age,D1[,-1],D2[,-1])

P = dim(X)[2]

1

y = rep(0,n)

y[dataX$use %in% 'Y'] = 1

An example stan file.

//

// This Stan program defines a logistic mixed model

//

// Learn more about model development with Stan at:

//

// http://mc-stan.org/users/interfaces/rstan.html

// https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started

//

data {

int<lower=0> n; //number of observations

int<lower=0> Q; //number of random effect levels

int<lower=0> P; //number of fixed effect levels

int y[n]; //response vector

matrix[n,Q] Z; //indicator matrix for random effect levels

matrix[n,P] X; //design matrix for fixed effects

}

// The parameters accepted by the model.

// accepts three sets of parameters 'beta', 'u' and 'sigma'.

parameters {

vector[P] beta; //vector of fixed effects of length P.

vector[Q] u; //vector of random effects of length Q.

real<lower=0> sigma; //random effect standard deviation

}

// The model to be estimated. We model the output

// 'y' to be bernoulli with logit link function,

// and assume a i.i.d. normal prior for u.

model {

u ~ normal(0,sigma); //prior for random effects.

y ~ bernoulli_logit(X*beta+ Z*u); //likelihood

}

library(rstan)

logistic.mm <-stan(file="logitmm.stan",data=c('Z','X','y','n','P','Q'),iter=2000,chains=4)

print(logistic.mm)

Note that in Stan, defaults for burn-in (warm-up) is one half of all iterations in stan, and no thinning. Note

the code is written using the stan file and csv is in your working directory. Use the print function to report

posterior means, standard deviations, 95 % central credible intervals and state from the output whether you

believe the chains have converged. Also report the reference categories for urban and livch.

b) An alternative to the logit link when analysing binary data is the probit. The probit link is defined as,

yi =1 if zi ≥ 0

In lecture 14, we showed how by letting zi be normal, probit regression can be fitted using a Gibbs sampler,

but to do so, it requires the ability to sample from a truncated normal defined on either (−∞, 0) (if yi = 0)

or (0, ∞) (if yi = 1). Check by comparing the empirical and the true density that a modified version of

the inverse cdf method can be used to produce draws from a truncated normal. Do this for the case where

x ∈ (0, ∞) and x ∈ (−∞, 0) with parameters µ = 0.5 and σ = 1.

Hints: If y is drawn from a truncated normal with lower bound a, upper bound b and parameters µ, σ2

which in R means the truncated normal density can be written as

dnorm(x,mean=mu,sd=sigma)/(pnorm(b,mean=mu,sd=sigma)-pnorm(a,mean=mu,sd=sigma))

The inverse cdf method involves drawing v from U(0, 1) so that x ∼ p(x) can be found solving x = F

where F is the cdf. If the only change compared to drawing from a normal distribution is truncation, think

about what happens to the bounds of the uniform distribution.

c) Implement a Gibbs sampler to fit the same mixed model as fitted in Stan in a), but now with a probit

link. As before, fit 4 chains, each running for 2000 iterations, with the first 1000 iterations discarded as

burn-in. Perform graphical convergence checks and Gelman-Rubin diagnostics. Report posterior means,

standard deviations and 95 % central credible intervals for σ, β, u by combining chains.

d) For the co-efficients β, u, calculate the mean of the ratio of the posterior means βi,logit/βi,probit, ui,logit/ui,probit

obtained when fitting the logistic mixed model and the probit mixed model. To do this, you will need to

apply the extract function to the stan model object. Once calculated, multiply the iterations obtained

assuming a probit link by this constant and compare to the iterations obtained assuming a logit link.

e) The logistic link can be written in the same way as the probit link, but instead of ei ∼ N (0, 1), the

error term is ei ∼ Logistic(0, 1). By evaluating the standard normal and logistic inverse cdfs and

superimposing the line y = mx where m is the posterior ratio, do you think the results in d) were

surprising.

3


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp