May Examination Period 2024
ECS659P Resit
Neural Networks and Deep Learning Duration: 2 hours (+1 for uploads)
Question 1
(a) Consider a regression dataset (x(1) , y(1) ), (x(2) , y(2) ), ... , (x(n) , y(n) ), where each obser- vation x(i) and target y(i) is a real number.
Suppose that the function f given by f(x) = 2 log(x) + x is a perfect predictive model, so that y(i) = f(x(i)) for every i.
Define a function φ : R → R2 that can transform the original regression dataset into a regression dataset (φ(x(1) ), y(1) ), (φ(x(2) ), y(2) ), ... , (φ(x(n) ), y(n) ) that can be used to recover the function fusing linear regression.
In other words, define a function φ such that
f(x) = φ(x) · w,
where · denotes the dot product and w ∈ R2 is a vector of parameters. [13 marks]
(b) Consider a regression dataset with 3 examples and 2 features per observation. Let X ∈ R3×2 denote the observation matrix that contains one row for each observation and one column for each feature, so that
l 0 2 」
X = ' 1 — 1 ' .
[ — 1 1 l
Let y ∈ R3 denote the target vector that contains one row for each target, so that
l 4 」
y = ' — 1 ' .
[ 1 l
Compute the mean squared error of a linear regression model that employs a weight vector
w = l3(1)]
and a bias
b = 1.
[12 marks]
Question 2
(a) Let [ C1 , C2 , ... , Ck ] denote an image (rank 3 tensor) composed of k channels, where each channel Ci is a matrix of a fixed shape.
Let A be an image given by
ll1 2 |
3 4」 l2 1 |
3 |
4」」 |
' ' 3 5 |
1 2' ' 1 1 |
2 |
5' ' |
A = ' ' |
' , ' |
|
' ' . |
' '3 2 |
1 0 ' '3 2 |
1 |
6 ' ' |
[[1 2 |
2 1l [1 3 |
3 |
4ll |
Compute the output image B of a max-pooling layer that receives A as input and uses a window of size 2 × 2 and a stride 2. [12 marks]
(b) Consider a 3 × 32 ×32 image that goes through a convolutional layer with 64 kernels, each a 3 × 7 × 7 image. What is the shape of the corresponding output if:
1. The convolutional layer uses padding 3 and stride 1.
2. The convolutional layer uses padding 0 and stride 1.
3. The convolutional layer uses padding 3 and stride 2.
Assume the conventional ordering of dimensions (number of channels, height, width). [13 marks]
Question 3
(a) Suppose that a 1 × 128 × 128 (grayscale) image is flattened and given directly to a multilayer perceptron. Suppose that this multilayer perceptron has 64 units in its first layer.
How many weights does this first layer have? How many biases does this first layer have? [10 marks]
(b) Consider a recurrent layer that has two matrices of parameters: A ∈ R32×32 and B ∈ R32×64. Suppose that this recurrent layer does not employ biases and uses a tanh activation function.
Write the equation that this layer would use to compute the current hidden state vector ht ∈ R32×1 based on the previous hidden state vector ht−1 ∈ R32×1, the current observation xt ∈ R64×1, and the parameter matrices A ∈ R32×32 and B ∈ R32×64 .
Hint: Ensure that the matrix-vector multiplications are valid. [15 marks]
Question 4
(a) Consider a linear regression model two weights and no bias.
Suppose that the weight vectors [2, 4]T and [3, 3]T achieve the same mean squared error on the training dataset. If weight decay were employed with λ > 0, which of these weight vectors would be preferred by optimization? [9 marks]
(b) Consider a loss function L : R2 → R given by
L( w1 , w2 ) = w1(2) + w2(2) ,
and note that the corresponding gradient function ΔL : R2 → R2 is given by
ΔL( w1 , w2 ) = [2 w1 , 2 w2]T .
Let w = [2, 4]T be the initial point for gradient descent with the goal of minimizing L. What are the next two points?
Assume a learning rate η = 0.25. [16 marks]
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。