联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2022-10-29 11:37

CS 8395: Homework 2

Overall directions

Any resource available to you may be used to answer the homework questions, and you may collaborate with

anyone in the class, with the following caveats:

You must record the names of your collaborators at the top of your homework.

You are strongly encouraged to avoid copying code/answers directly; instead, where possible, write

your own versions! (Copying boilerplate data/plotting code is often most efficient, but you’ll learn

more about the Deep Learning content by implementing those parts yourself.)

This homework is due November 1st at Midnight, via Brightspace. Please zip your code using either the

.zip or .tar.gz formats, and please submit written work in PDF.

Please also include the approximate amount of time you spent on this homework at the top of your

submission.

Rate-Distortion and Echos

In class the VAE we trained used a loss function composed of a data term (the MSE) and a prior term (the

KL divergence). A week or so later we derived the β-VAE, which relaxes a constrained optimization problem

to the following (unconstrained) loss:

L[p, q] = ? log p(x|z)︸ ︷︷ ︸

MSE

+β KL[ q(z|x) ∥ p(z) ]︸ ︷︷ ︸

prior

a) Describe VAE behavior at the extreme cases of β → 0 and β →∞.

b) We also discussed the Rate-Distortion version of the loss function, using q(z) the posterior or empirical

marginal distribution of z induced by q(z|x) (integrated over data x):

L[p, q] = ? log p(x|z)︸ ︷︷ ︸

MSE

+β KL[ q(z|x) ∥ q(z) ]︸ ︷︷ ︸

posterior

Explain why calculation of q(z) is hard with Gaussian encoders q(z|x).

c) Suppose instead we choose a different encoder distribution, using two (determinstic) functions m(x) and

s(x):

q?(z|x) = m(x) + s(x) · z1

Here, z1 is ALSO sampled from q

?(z|x = x1), but using x1, a new data sample. Write out two recursions of

q?(z|x), expanding z1 into its definition using q?, etc., stopping at z3.

d) We know that KL[ q(z|x) ∥ q(z) ] is equivalent to I(x, z). Using the identity I(x, z) = H(z) ?H(z|x),

show that KL[ q?(z|x) ∥ q?(z) ] = Ex[log detS(x)], where q?(z) is induced by q?(z|x).

Hint: use H(z|x) = Ex [ H[m(x) + s(s) · z1|X = x] ]

e) What are the main computational issues with sampling from q?? Devise a scheme to sample from q?,

using any approximation/simplifying assumptions you deem necessary.

1

GAN Construction

In this assignment you will construct this basic GAN and then the W-GAN using the MNIST dataset. You

are free to use whichever architecture you prefer. However, unless you have access to a GPU, I strongly

recommend modifying the decoder structure from the VAE assignment. There are several extant examples

using fully connected networks with modules like:

[Linear→ LeakyReLU]× 4

using increasing linear layer sizes, e.g., 128 → 256 → 512 → 1024 → (28 ? 28). LeakyReLU often replaces

the ReLU non-linearity in direct generative models both in convolutional generators and fully connected

generators. If you do choose to use convolution, I recommend an architecture that looks like:

[ConvTranspose→ BatchNormLeakyReLU]× 2

after several fully connected layers. Using a stride of 2 for the conv transpose, this means you can go from

7× 7 to 14× 14 to 28× 28 which is the image size.

As a reminder, for the “vanilla” GAN, the loss function of the discriminator D is

Ladv = BCEntropy[D(xreal), 1] + BCEntropy[D(G(z)), 0]

The loss for the generator G is

Lgen = BCEntropy[D(G(z)), 1]

= ?BCEntropy[D(G(z)), 0]

Fully connected discriminators should be sufficient in both cases. I also recommend a slower learning

rate ~ 0.0001, and at least 100 epochs. For MNIST there generally should be no need to tune the critic

index, training 1-to-1 should be sufficient.

a) Plot the results of your GAN (even if they’re bad).

b) Implement the gradient penalty of W-GAN, and compare results to the regular GAN.

c) Given an image, compute its GAN embedding z by gradient descent; that is, given x, freeze the generator

and optimize ∥G(z)?x∥ for z. Test this out on the first few points in the test dataset of MNIST. By starting

at different initializations of z, do we receive different z? = argminz ∥G(z) ? x∥? (essentially, is argmin a

local min?).

d) Given two images xA and xB and their inverted codes zA and zB , describe in English/pseudocode a method

for traversing between their codes with minimal differences per step length, i.e. construct a (discretization

of) Z(t) a path through the space of z’s that minimizes

U(Z(t)) =

∫ 1

0

[∥G(Z(t))? xA∥22 + ∥G(Z(t))? xB∥22] ∥∥∥∥dZ(t)dt

∥∥∥∥

2

dt such that Z(0) = zA, Z(1) = zB (1)

(bonus points for implementing the proposed method)

e) Suppose we wanted to learn decodings of z so that minZ(t) U(Z(t)) is minimal. How could we modify our

GAN training procedure to accomplish this?


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp