联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-28 10:40

Student

Number

Semester 2 Assessment, 2019

School of Mathematics and Statistics

MAST90083 Computational Statistics and Data Mining

Writing time: 3 hours

Reading time: 15 minutes

This is NOT an open book exam

This paper consists of 3 pages (including this page)

Authorised Materials

• Mobile phones, smart watches and internet or communication devices are forbidden.

• No handwritten or print materials may be brought into the exam venue.

• This is a closed book exam.

• No calculators of any kind may be brought into the examination.

Instructions to Students

• You must NOT remove this question paper at the conclusion of the examination.

Instructions to Invigilators

• Students must NOT remove this question paper at the conclusion of the examination.

This paper must NOT be held in the Baillieu Library

MAST90083 Semester 2, 2019

Question 1 Suppose we have a model p(x, z | θ) where x is the observed dataset and z are the

latent variables.

(a) Suppose that q(z) is a distribution over z. Explain why the following

F(q, θ) = Eq [log p(x, z | θ) − log q(z)]

is a lower bound on log p(x | θ).

(b) Show that F(q; θ) can be decomposed as follows

F(q, θ) = −KL(q(z) || p(z|x, θ)) + log p(x | θ)

where for any two distributions p and q, KL(q||p) = −Eq log p(z)

q(z)

is the Kullback-Leibler

(KL) divergence.

(c) Describe the EM algorithm in terms of F(q, θ).

(d) Note that the KL divergence is always non-negative. Furthermore, it is zero if and only if

p = q. Conclude the optimal q that maximises F is p(z | x, θ).

[10 + 10 + 5 + 5 = 30 marks]

Question 2 Let {(xi

, yi)}

n

i=1 be our dataset, with xi ∈ R

p and yi ∈ R. Classic linear regression

can be posed a empirical risk minimisation, where the model is to predict y using a class of

functions f(x) = w

T x, parametrised by vector w ∈ R

p using the squared loss, i.e. we minimise

(a) Show that the optimal parameter vector is

wˆn = (XT X)

−1XT Y

where X is n × p matrix, with i-th row given by x

T

i

and Y is a n × 1 column vector with

i-th entry yi

(b) Consider regularising the empirical risk by incorporating an l2 penalty. That is, find w

minimising.

Show that the optimal parameter is given by the ridge regression estimator

wˆridge

n = (XT X + λI)−1XT Y.

(c) Suppose we now wish to introduce nonlinearities into the model, by transforming x to

φ(x). Let Φ be a matrix with i-th row given by φ(xi)T.

(i) Show the optimal parameters would be given by

wˆkernel

n = (ΦT Φ + λI)−1ΦT Y

(ii) Express the predicted y values on the training set, Φ ˆw

kernel n, only in terms of y and

the Gram matrix K = ΦΦT

, with Kij = φ(xi)

T φ(xj ) = k(xi

, xj ), where k is some

kernel function. (This is known as the kernel trick.) Hint: You will find the following

matrix inversion formula useful:

Page 2 of 3 pages

MAST90083 Semester 2, 2019

(iii) Compute an expression for the value of y

∗ predicted by the model at an unseen test

vector x∗.

[5+5+5+10+5 = 30 marks]

Total marks = 60

End of Exam

Page 3 of 3 pages


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp