联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2025-04-21 09:29

Assignment 3

Theoretical Aspects

You are asked to solve the problems and submit an answer sheet in PDF format (one

file). LATEX, Typst, Microsoft Word, or any word processor-generated PDF documents

are preferable, while clear scans of handwritten answer sheets are acceptable—Scoring

will not differentiate between them. Note that you are supposed to write down the

process of derivation in detail so that we can score the steps in case the final answer is

wrong.

Q1| Consider a cancer screening problem. Assume C = 1 corresponds to “cancer” and

C = 0 corresponds to “no cancer,” and the prevalence of cancer is low with a probability

p(C = 1) = 0.001. Let T represent the outcome of a screening test, where T = 1 denotes

a positive result “having cancer” and T = 0 denotes a negative result “having no cancer.”

Given p(T = 1|C = 1) = 0.9 and p(T = 0|C = 0) = 0.97, if a test is positive, what is

the probability that the person has cancer?

Q2| The Gaussian distribution is defined by the probability density function

N (x|µ, σ2

) = 1

(2πσ2)

1/2

exp  −

1

2

(x − µ)

2



(3.1)

where µ is the mean, and σ

2

is the variance. Drawing independent N data points from

this distribution, we have a dataset x , {x1, x2, · · · , xN } and its joint probability is

p(x|µ, σ2

) =

N

Y

n=1

N (xn|µ, σ2

). (3.2)

This is a likelihood function because the samples in x are known values, while µ and

σ

2 are tunable parameters of the function. If we maximize the function p(x|µ, σ2

) with

respect to µ and σ

2

, we obtain the maximum likelihood estimation (MLE) solution µMLE

and σMLE

2

. Please

1. Calculate µMLE and σMLE

2

; and

2. Consider µMLE and σMLE

2 are random variables, calculate the expectations E(µMLE)

and E(σMLE

2

).

Hint: µMLE is a function of {x1, x2, · · · , xN }, σMLE

2

is a function of {x1, x2, · · · , xN }

and µMLE; E(µMLE) is a function of µ, and E(σMLE

2

) is a function of σ

2

.

Q3| The Kullback–Leibler divergence between two distributions p(x) and q(x) is given by

KL(pk q) = −

Z

−∞

p(x) ln

p

q(

(

x

x

)

)

dx. (3.3)

1

Please calculate the Kullback–Leibler divergence between the two Gaussians p(x) =

N (x|µ, σ2

) and q(x) = N (x|m, s2

).

Q4| The two-dimensional or “bivariate” Gaussian density function is defined by

f(x1, x2) = 1

2πσ1σ2

p 1 − ρ

2

·

exp  −

2(1 −

1

ρ

2)



(x1 −

σ1

2

a1)

2

− 2ρ

(x1 − a1)(x2 − a2)

σ1σ2

+

(x2 − a2)

2

σ2

2



(3.4)

where for random variables X1 and X2, their expectations are E(X1) = a1 and E(X2) =

a2, variances are σ1

2 and σ2

2

, respectively; the correlation coefficient ρ =

E[(X1−a

σ

1)(X2−a2)]

1σ2

.

Is there any probability density function g(x1, x2), where (x1, x2) ∈ R

2

, such that con ditional distributions g(x1|x2) and g(x2|x1) are both one-dimensional Gaussian, but

g(x1, x2) is not two-dimensional Gaussian?

If there is any, give an example of g(x1, x2), and demonstrate that such a g(x1, x2)

satisfies the conditions. If there is none, please demonstrate why there is none.

Q5| (Challenging Problem) A restricted Boltzmann machine (RBM) is a generative stochas tic neural network that can learn a probability distribution. As this challenging problem

can be reduced to a math problem with an elementary description, the mechanism of

the RBM is not introduced in depth here.

The standard type of RBM has binary-valued units, hidden and visible. The value of

each unit is either 0 or 1. For convenience, we denote the finite set {0, 1} as B. Between

hidden and visible units associates a weight matrix W. Ignoring the biases, an RBM

can be illustrated as Figure 3.1. Let v ∈ B

v×1 be the vector of visible units, h ∈ B

h×1

be the vector of hidden units, then the energy of a specific case of v and h (it is called

a configuration) is defined as

E(v, h) = −v

TWh (3.5)

where W ∈ R

v×h

is a constant weight matrix, and always note that, every element of v

or h is either 0 or 1. The joint probability for a configuration is defined as

P(v, h) = 1

Z

e

−E(v,h) =

1

Z

e

v

TWh (3.6)

where Z is the normalization factor that guarantees

X

all v

X all h

P(v, h) = 1 (3.7)

required by the probability definition. Thus, Z is the sum of e

v

TWh over all possible

configurations.

A specific W ∈ R

10×256 is provided in the attachment w2.csv, which also implies

the dimensions of the vectors v ∈ B

10×1 and h ∈ B

256×1

. Please calculate the value of

lnZ (3.8)

with two digits after the decimal point for this specific W. As you need to write a

program to solve this problem, along with the final result, you are also required to paste

your codes or a descriptive algorithm in the answer sheet.

2

v1

v2

h1

h2

h3

Figure 3.1: Restricted Boltzmann Machine (RBM) without biases. Nodes v1 and v2 are

the visible units, and nodes h1, h2, and h3 are the hidden units. The edge connecting

vi and hj has a weight wij .

Example with na¨ıve algorithm

Consider a small RBM has 2 visible units, 1 hidden unit, and a weight matrix

W =



0

0

.

.

2

3



(3.9)

The configuration will have 22 × 2

1 = 8 possibilities, i.e.

Configuration v1 v2 h1

1 0 0 0

2 0 0 1

3 0 1 0

4 0 1 1

5 1 0 0

6 1 0 1

7 1 1 0

8 1 1 1

Thus, we have

Z = e

[0 0]W[0] + e

[0 0]W[1] + e

[0 1]W[0] + e

[0 1]W[1] + e

[1 0]W[0] + e

[1 0]W[1]

+ e

[1 1]W[0] + e

[1 1]W[1] (3.10)

= 1 + 1 + 1 + e

0.3 + 1 + e

0.2 + 1 + e

0.5

(3.11)

= 9.219982836436301 (3.12)

so lnZ = 2.22 (with two digits after the decimal point).

Hints

1. The number of configurations for 10 visible units and 256 hidden units is very

large, i.e. 210 × 2

256 = 2266, so you must find an efficient way to calculate ln Z.

The na¨ıve brute-force algorithm described above does not work.

2. The math used to simplify the calculation is not beyond the middle school level.

3. The computation does not require much resource. There is no doubt that your

laptop is able to solve the problem within seconds.

3

4. The result ln Z lies in the range [200, 300]. It is not a rough estimation, but a

value with a precision of at least 5 significant figures.

5. The specific W is generated by a random number generator. There is no ex ploitable mathematical structure for the matrix.

6. Please do not waste too much time on this problem—it is difficult. It

only serves as a bonus problem that helps with final score normalization.

4


相关文章

【上一篇】:到头了
【下一篇】:没有了

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp