联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-11-22 04:55

HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST

Instructions: Answer the following questions and submit your answers on Blackboard in a single pdf file

named written.pdf. It is fine to photograph hand-written answers or type answers in MS Word, as long

as you collate your answers and convert them into this single pdf file before submission. It is also fine

to use Python to perform intermediate computations, as long as you show code snippets to explain your

reasoning. Questions begin on the following page.

Fall 2020

HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST

1. Consider a neural network y = f(w, x) with the following form:

v = conv(w, x)

y = tanh X

m,n

vm,n!

where w, x, and v are 2D arrays, and y is a scalar. The v = conv(w, x) operation is defined as follows:

vm,n =

X

i,j

wi,jxm+i,n+j

The indices i, j range over the rows and columns of w. The result vm,n is only defined for indices

m, n where xm+i,n+j will not produce an array out-of-bounds error. In other words, the shape of v

will be smaller than the shape of x.

The network is given a dataset with two images:

? x

(0):

+1 +1 -1 -1 -1 -1

+1 +1 +1 +1 +1 +1

+1 +1 +1 +1 +1 +1

? x

(1):

-1 +1 -1 +1 +1 +1

+1 -1 -1 +1 +1 +1

-1 -1 -1 +1 +1 +1

Each image x

(n)

is fed through the network to produce a hidden activation v

(n) = conv(w, x(n)

) and

an output activation y

(n) = f(w, x(n)

). The target outputs are -1 for x

(0) and +1 for x

(1). Therefore

the total squared error is

e = (y

(0) ? (?1))2 + (y

(1) ? (+1))2

The filter weights w will be optimized using gradient descent to minimize e. Currently, the filter

weights are:

w =



+1 ?1

?1 +1 

For each of the following, give your answers to four decimal places (e.g., “0.1234”):

? For each image (n ∈ {0, 1}), what are the numeric values of the hidden activation v

(n)

and

output activation y

(n)

? What is the numeric value of the total squared error e?

? What is the numeric value of the gradient ?we?

? After one step of gradient descent, using a learning rate η = 1000, what are the new numeric

values of the filter weights w? What are the new values of each v

(n)

and y

(n)

? What is the new

total squared error?

Fall 2020

HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST

2. Figure 1 shows one particular state of a simple toy domain. Each position contains either a wall, a

mouse, or nothing. This state will be encoded in a numeric array so that it can be used as input to a

neural network.

? Consider a monotonic encoding in which 0 represents empty positions, 1 represents walls, and 2

represents mice. What is the monotonic encoding of the state in Figure 1? Your answer should

be a 4 × 8 array with one entry per grid position.

? Consider a one-hot encoding in which indices 0, 1, and 2 correspond to walls, empty positions,

and mice, respectively. What is the one-hot encoding of the state in Figure 1? Your answer

should show the three “pages” of a 3 × 4 × 8 array, where there is a one-hot vector for each grid

position, and the one-hot vectors extend along the first array dimension.

Figure 1: Mouse grid world. Black = wall, gray = mouse, white = empty.

Fall 2020

HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST

3. Consider an artificial neuron with two synaptic inputs (x, y) ∈ R

2

and scalar output z ∈ R:

z = f(x, y) = sign(w0x + w1y + b)

The neuron parameters are weights (w0, w1) ∈ R

2

and bias b ∈ R.

The neuron is provided a dataset with four examples of input (x

(n)

, y(n)

) and target output z

(n)

, where

n ∈ {0, 1, 2, 3}. The goal is to find values for w0, w1, and b, such that f(x

(n)

, y(n)

) = z

(n)

for all n.

For dataset 1 below, do such values for w0, w1, and b exist? Repeat the question for datasets 2 and 3.

? Dataset 1:

n 0 1 2 3

x

(n) +2 -1 +3 -1

y

(n) +2 -1 -3 +1

z

(n) +1 +1 +1 -1

? Dataset 2:

n 0 1 2 3

x

(n)

-3 -2 +3 +2

y

(n)

-3 +1 -1 +3

z

(n) +1 -1 +1 -1

? Dataset 3:

n 0 1 2 3

x

(n)

-2 +3 +2 -2

y

(n) +1 -1 +2 -2

z

(n)

-1 -1 +1 +1

Fall 2020

HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST

4. ? Consider the following three arrays:

a

(0) = [+0, +1, ?1, +0, +0]

a

(1) = [?3, ?2, ?4, ?3, ?3]

a

(2) = [+1, +2, +0, +1, +1]

What are the element-wise differences a

(1) ? a

(0) and a

(2) ? a

(0)? What are the numeric values

(give your answer to four decimal places) of softmax(a

(n)

) for each n ∈ {0, 1, 2}?

? Suppose you are using a neural network to fit a dataset. The target output values fall in the range

[-1, 4]. Which of the following activation functions fk is most appropriate for the output layer?

f1(a) = 3.0 · tanh(a) ? 1.0

f2(a) = 9 · sigmoid(a) ? 8

f3(a) = 4 · relu(a) ? 1

Here,

tanh(x) = (e

2x ? 1)/(e

2x + 1) is the standard hyperbolic tangent,

sigmoid(x) = 1/(1 + e

?x

) is the standard logistic sigmoid, and

relu(x) = max(x, 0) is the standard linear rectifier.

Fall 2020

HW4 for CIS667: Intro to AI NetID: kli146 Due Sat Nov 21, 11:59pm EST

5. Consider the Markov Decision Process (MDP) shown in Figure 2. The agent uses a discount factor of

γ =

1

2

.

? Determine the probabilities for the unlabeled edges. What is the probability transition array for

this MDP? Your answer should consist of two 7 × 7 matrices, one for each action. Use the

convention that row index i and column index j reflect transitions from state si to state sj .

? Suppose the agent starts in state s0 at time-step t = 0, and then performs the following sequence

of actions: [a0, a1, a0]. What is the expected net discounted reward after this sequence is complete?

More formally, what is the numeric value of E[

P3

t=0 γ

tR(t)

]? Give your answer as an

exact fraction or to four decimal places (e.g., “0.1234”).

? Using an initial policy estimate π

(0) = [a1, a1, a1, a1, a0, a1, a1]

>, perform the policy iteration

algorithm to find an optimal policy in this MDP. For each iteration k ∈ {0, 1, 2}, report the

numerical values (to four decimal places) of the following: the current policy estimate π

(k)

, the

current “collapsed” probability transition matrix Pπ(k) , and the current utility vector estimate

u

(k)

. Is π

(2) an optimal policy?

Hint: In policy iteration you need to solve the linear system of equations (I ? γPπ(k) )u

(k) = r for

u

(k)

. numpy has a method that can do this:

https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html

Figure 2: A toy MDP with 7 states and 2 actions. Each circle represents a state si and is labeled with

that state’s reward ri

. Arrows represent transitions between states and outer arrows are labeled with their

transition probability. Transition probabilities for inner arrows are not shown. If there is no arrow between

two states, it means that the corresponding transition has probability 0. The left and right subplots shows

transition probabilities when action a0 and a1 are performed, respectively.

Fall 2020


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp