联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-24 11:31

MATH6152: Statistical Computing (R) Coursework

Assignment

This coursework assignment is worth 50% of the overall mark for the module.

The deadline is 1200 on Friday 25th October 2019 and your completed work must be submitted

electronically via Blackboard.

Your submission must consist of exactly one script containing R code to answer the two questions below.

Your script must have the filename <your student ID>.R. For example, if your student ID is 12345678, then

your script must have the filename 12345678.R.

You must informatively comment your code including indicating where each question and task begins.

Your entire script should be reprodicible and run without any errors.

Some tasks in the questions require you to create R objects. These should be named exactly as requested in

the question.

You must not load any add-on packages, e.g. using library or require.

If you are required to import data from a file, assume that this file is in your working directory. This means

you should not call the function setwd.

At the end of each question, there are some hints that you may find useful.

The standard Faculty rules on late submissions apply: for every weekday, or part of a weekday, that the

coursework is late, the mark will be reduced by a factor of 10%. No credit will be given for work which is

more than one week late.

All coursework must be carried out independently. You are reminded of the University’s Academic Integrity

Policy.

In the interests of fairness, queries which relate directly to the coursework must be made via the MATH6152:

Statistical Computing (R) Coursework Assignment Discussion Forum on Blackboard. This ensures

that everybody has access to the same information.

Total marks available: 100

1

Question 1: Random walk Metropolis-Hastings algorithm (50

marks)

In this question, you will write a function to simulate a sample from a given distribution using a random

walk Metropolis-Hastings algorithm.

Often in a statistical analysis, it is essential to be able to simulate (or generate) a sample from a given

continuous probability distribution. For example, R functions rnorm, runif and rgamma can simulate samples

from the normal, uniform and gamma distributions, respectively. However, sometimes we encounter a

continuous distribution which does not have a known form and for which an in-built R function does not

exist. In this case, we can use, for example, the following random walk Metropolis-Hastings algorithm to

simulate a sample.

Suppose we wish to simulate a sample of size M from the probability distribution of the continuous random

variable X œ (≠Œ,Œ). We are given the function g(x) Ø 0 such that the probability density function of X is

f(x) = g(x)

s Œ≠Œ g(x)dx.

The steps of the random walk Metropolis-Hastings algorithm are as follows.

1. Let x0 be the starting value.

2. For i = 1,...,M complete the following steps.

(a) Simulate a proposal

yi ≥ N (xi≠1, exp(„)),

i.e. a normal distribution with mean xi≠1 and variance exp(„).

(b) Calculate acceptance probability

–i = min ;

(c) Simulate ui ≥ uniform(0, 1). If –i Ø ui, then accept the proposal and set xi = yi; otherwise reject

the proposal and set xi = xi≠1.

The chain of values x1,...,xM represents an approximate (and not independent) sample from the probability

distribution of X (irrespective of the value of „). The proportion of proposals which are accepted is known

as the acceptance rate.

Complete the following tasks.

(A) Write a function called rwmet that has exactly two arguments:

• M: the sample size, M;

• phi: the quantity „ controlling the variance of the proposal distribution in step 2(a) of the

algorithm;

that will simulate a sample of size M from a probability distribution with

g(x) = exp (◊x) exp (≠ exp (◊x)),

where ◊ is the last digit of your student ID using the random-walk Metropolis-Hastings algorithm.

(If the last digit of your student ID is 0, then ◊ = 0.5). Set the starting value in the random walk

Metropolis-Hastings algorithm to be x0 = 1. [20 marks]

2

(B) We can use a sample simulated from the distribution of X to approximate the expectation of X.

This is simply done by approximating E(X) by the sample mean. It is a random (or Monte Carlo)

approximation in that if another sample is generated a dierent

approximation of E(X) will be produced.

We want to ensure that it is a good approximation, i.e. it has low variance. Use your function to

generate N = 100 samples of size M = 10000 with „ = ≠4. Calculate the variance of the N means of

your samples. [5 marks]

(C) In general, if „ is increased then the average acceptance probability will decrease. Conversely, if „

is decreased then the average acceptance probability will increase. Theoretical results suggest that

under certain conditions the optimal acceptance rate for a random walk Metropolis-Hastings algorithm

is 0.234. Create a plot of acceptance rate against an equally-spaced sequence of C = 100 dierent

values of „ ranging from -5 to 10. For each „, the accptance rate should be based on a sample of size

M = 10000. Based on your plot, approximately what value of „ is optimal? [20 marks]

(D) Use your function to generate N = 100 samples of size M = 10000 with „ equal to the optimal value

chosen in Task (C). Calculate the variance of the N means of your samples. Comment on the dierence

between the variance calculated here and in Task (B). [5 marks]

Hints:

Question 2: Iterated weighted least squares (50 marks)

In this question, you will write a function to find the maximum likelihood estimates for a logistic regression

model using iterated weighted least squares (IWLS).

Under a logistic regression model, the ith response has

yi ≥ Bernoulli(µi), (1)

for i = 1,...,n. In (1), the mean µi is linked to the linear predictor, ÷i, via

g(µi) = log 3 µi

where — = (—1,..., —p) is a p ◊ 1 vector of unknown parameters and xi = (xi1,...,xip)

T is a p ◊ 1 vector of

explanatory variables for the i response.

IWLS involves iterating the following weighted least squares estimate for j = 1, 2,... until convergence

where z(j≠1) is an n ◊ 1 vector with ith value given by

and W(j≠1) is an n ◊ n diagonal matrix with ith diagonal value given by. (3)

Note that, in (2) and (3).

Convergence occurs when

max

k=1,...,p |—ˆ(j≠1)

k ≠ —ˆ(j)

k | < ‘,

where ‘ > 0 is a pre-specified tolerance and —ˆ(j)

k is the kth element of —ˆ(j)

. At the point of convergence, —ˆ(j)

is returned as the maximum likelihood estimates of —. Note that —(j) does not mean — to the power of j, it

means that —(j) is one vector in a sequence —(0), —(1),..., —(j)

,.... The same applies for µ(j)

i and W(j)

ii .

Complete the following tasks.

(A) Write a function called iwls that given a vector of binary responses (0s and 1s), and a model matrix

will return the maximum likelihood estimates of — for a logistic regression model. The function should

have exactly four arguments:

4

• X: the n ◊ p model matrix X;

• y: the n ◊ 1 vector of responses y = (y1,...,yn);

• itmax: the maximum number of iterations of the IWLS algorithm, i.e. the highest value that j

can take;

• eps: the tolerance, ‘, used to assess convergence.

Set itmax and eps to have default values of 25 and 10≠9, respectively. The starting value —(0) should

be the corresponding least squares estimate for —. [35 marks]

(B) The data stored in the file diab.txt is originally from the US National Institute of Diabetes and

Digestive and Kidney Diseases and contains measurements from n = 768 female individuals. The

objective of the formation of the data is to predict whether or not a patient has diabetes based on a

series of eight measurements. The variables in the dataset are described in Table 1.

Table 1: Variables in the diab.txt file.

R name Symbol Description

preg z1 Number of times pregnant

gluc z2 Plasma glucose concentration a 2 hours in an oral glucose tolerance test

bp z3 Diastolic blood pressure

skin z4 Triceps skin fold thickness

insulin z5 2-Hour serum insulin

bmi z6 Body mass index

ped z7 Diabetes pedigree function

age z8 Age

out y Binary response variable (0 = no diabetes, 1 = diabetes)

Read in the data in the file diab.txt. Consider a logistic regression model with yi as the response (see

Table 1) and the following linear predictor:

÷i = ÿ

9

k=1

—kxik,

where, if k = 1, then xik = 1 and if k > 1 then xik is given by the value of zk≠1 for the ith person (see

Table 1). In other words, the model has a linear predictor with an intercept, and a term for each of the

explanatory variables (the z’s) in Table 1.

Using your iwls function, find the maximum likelihood estimates of — = (—1,..., —9)T for the above

model. Use the default values for itmax and eps. Fit the same model with the in-built glm function

and comment on any dierences

between the estimates found by your iwls function and by glm.

[15 marks]

Hints:

• A diagonal matrix is a matrix with non-zero elements on the leading diagonal and zeros elsewhere.

• The R function as.vector can convert an R matrix into an R vector.

• The R function as.matrix can convert an R data frame into an R matrix.

5


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp