Assignment 3
Theoretical Aspects
You are asked to solve the problems and submit an answer sheet in PDF format (one
file). LATEX, Typst, Microsoft Word, or any word processor-generated PDF documents
are preferable, while clear scans of handwritten answer sheets are acceptable—Scoring
will not differentiate between them. Note that you are supposed to write down the
process of derivation in detail so that we can score the steps in case the final answer is
wrong.
Q1| Consider a cancer screening problem. Assume C = 1 corresponds to “cancer” and
C = 0 corresponds to “no cancer,” and the prevalence of cancer is low with a probability
p(C = 1) = 0.001. Let T represent the outcome of a screening test, where T = 1 denotes
a positive result “having cancer” and T = 0 denotes a negative result “having no cancer.”
Given p(T = 1|C = 1) = 0.9 and p(T = 0|C = 0) = 0.97, if a test is positive, what is
the probability that the person has cancer?
Q2| The Gaussian distribution is defined by the probability density function
N (x|µ, σ2
) = 1
(2πσ2)
1/2
exp −
2σ
1
2
(x − µ)
2
(3.1)
where µ is the mean, and σ
2
is the variance. Drawing independent N data points from
this distribution, we have a dataset x , {x1, x2, · · · , xN } and its joint probability is
p(x|µ, σ2
) =
N
Y
n=1
N (xn|µ, σ2
). (3.2)
This is a likelihood function because the samples in x are known values, while µ and
σ
2 are tunable parameters of the function. If we maximize the function p(x|µ, σ2
) with
respect to µ and σ
2
, we obtain the maximum likelihood estimation (MLE) solution µMLE
and σMLE
2
. Please
1. Calculate µMLE and σMLE
2
; and
2. Consider µMLE and σMLE
2 are random variables, calculate the expectations E(µMLE)
and E(σMLE
2
).
Hint: µMLE is a function of {x1, x2, · · · , xN }, σMLE
2
is a function of {x1, x2, · · · , xN }
and µMLE; E(µMLE) is a function of µ, and E(σMLE
2
) is a function of σ
2
.
Q3| The Kullback–Leibler divergence between two distributions p(x) and q(x) is given by
KL(pk q) = −
Z
−∞
∞
p(x) ln
p
q(
(
x
x
)
)
dx. (3.3)
1
Please calculate the Kullback–Leibler divergence between the two Gaussians p(x) =
N (x|µ, σ2
) and q(x) = N (x|m, s2
).
Q4| The two-dimensional or “bivariate” Gaussian density function is defined by
f(x1, x2) = 1
2πσ1σ2
p 1 − ρ
2
·
exp −
2(1 −
1
ρ
2)
(x1 −
σ1
2
a1)
2
− 2ρ
(x1 − a1)(x2 − a2)
σ1σ2
+
(x2 − a2)
2
σ2
2
(3.4)
where for random variables X1 and X2, their expectations are E(X1) = a1 and E(X2) =
a2, variances are σ1
2 and σ2
2
, respectively; the correlation coefficient ρ =
E[(X1−a
σ
1)(X2−a2)]
1σ2
.
Is there any probability density function g(x1, x2), where (x1, x2) ∈ R
2
, such that con ditional distributions g(x1|x2) and g(x2|x1) are both one-dimensional Gaussian, but
g(x1, x2) is not two-dimensional Gaussian?
If there is any, give an example of g(x1, x2), and demonstrate that such a g(x1, x2)
satisfies the conditions. If there is none, please demonstrate why there is none.
Q5| (Challenging Problem) A restricted Boltzmann machine (RBM) is a generative stochas tic neural network that can learn a probability distribution. As this challenging problem
can be reduced to a math problem with an elementary description, the mechanism of
the RBM is not introduced in depth here.
The standard type of RBM has binary-valued units, hidden and visible. The value of
each unit is either 0 or 1. For convenience, we denote the finite set {0, 1} as B. Between
hidden and visible units associates a weight matrix W. Ignoring the biases, an RBM
can be illustrated as Figure 3.1. Let v ∈ B
v×1 be the vector of visible units, h ∈ B
h×1
be the vector of hidden units, then the energy of a specific case of v and h (it is called
a configuration) is defined as
E(v, h) = −v
TWh (3.5)
where W ∈ R
v×h
is a constant weight matrix, and always note that, every element of v
or h is either 0 or 1. The joint probability for a configuration is defined as
P(v, h) = 1
Z
e
−E(v,h) =
1
Z
e
v
TWh (3.6)
where Z is the normalization factor that guarantees
X
all v
X all h
P(v, h) = 1 (3.7)
required by the probability definition. Thus, Z is the sum of e
v
TWh over all possible
configurations.
A specific W ∈ R
10×256 is provided in the attachment w2.csv, which also implies
the dimensions of the vectors v ∈ B
10×1 and h ∈ B
256×1
. Please calculate the value of
lnZ (3.8)
with two digits after the decimal point for this specific W. As you need to write a
program to solve this problem, along with the final result, you are also required to paste
your codes or a descriptive algorithm in the answer sheet.
2
v1
v2
h1
h2
h3
Figure 3.1: Restricted Boltzmann Machine (RBM) without biases. Nodes v1 and v2 are
the visible units, and nodes h1, h2, and h3 are the hidden units. The edge connecting
vi and hj has a weight wij .
Example with na¨ıve algorithm
Consider a small RBM has 2 visible units, 1 hidden unit, and a weight matrix
W =
0
0
.
.
2
3
(3.9)
The configuration will have 22 × 2
1 = 8 possibilities, i.e.
Configuration v1 v2 h1
1 0 0 0
2 0 0 1
3 0 1 0
4 0 1 1
5 1 0 0
6 1 0 1
7 1 1 0
8 1 1 1
Thus, we have
Z = e
[0 0]W[0] + e
[0 0]W[1] + e
[0 1]W[0] + e
[0 1]W[1] + e
[1 0]W[0] + e
[1 0]W[1]
+ e
[1 1]W[0] + e
[1 1]W[1] (3.10)
= 1 + 1 + 1 + e
0.3 + 1 + e
0.2 + 1 + e
0.5
(3.11)
= 9.219982836436301 (3.12)
so lnZ = 2.22 (with two digits after the decimal point).
Hints
1. The number of configurations for 10 visible units and 256 hidden units is very
large, i.e. 210 × 2
256 = 2266, so you must find an efficient way to calculate ln Z.
The na¨ıve brute-force algorithm described above does not work.
2. The math used to simplify the calculation is not beyond the middle school level.
3. The computation does not require much resource. There is no doubt that your
laptop is able to solve the problem within seconds.
3
4. The result ln Z lies in the range [200, 300]. It is not a rough estimation, but a
value with a precision of at least 5 significant figures.
5. The specific W is generated by a random number generator. There is no ex ploitable mathematical structure for the matrix.
6. Please do not waste too much time on this problem—it is difficult. It
only serves as a bonus problem that helps with final score normalization.
4
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。