联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2021-12-22 09:20

Computing Science Department

CMPUT 366 - Fall 2021

Problem Set

Unit 2 - Learners

Intelligent Systems (CMPUT 366)

1. Consider the trees below representing zero-sum two-player games. Write the values of x that allow

Minimax search with Alpha Beta pruning to prune the dashed nodes. You should write the word

‘None’ if no value of x allows the nodes to be pruned. You should assume that the search algorithm

visits the children from left to right.

max

min

max

5 8 5 9 5 x 10 20

max

min

max

8 5 7

2 1

x

max

min

max

x 5 5 7

2 1

2. The tree below represents a zero-sum two-player game where the green nodes (root of the tree) are

max’s nodes and yellow nodes are min’s nodes. Assuming a left-to-right ordering of visit, prune the

branches that Alpha Beta pruning would prune. This example created by Ariel Felner.

3. Consider the following training set of a regression problem, where given a value of x one wants to

predict the value of y with the hypothesis hw(x) = wx + b.

1

x y

1 1

2 2

3 3

a) Given that w = b = 0 and α = 0.1, what are the values of w and b after the first iteration of

gradient descent?

b) For α = 0.1 the algorithms takes approximately 300 iterations to converge to values of w and b

for which further gradient updates will not decrease the Mean Squared Error of the model. What

happens if we reduce the value of α to 0.01. Do you think it will converge with more or fewer

iterations than when α = 0.1? What is the difference between the solution encountered with

α = 0.1 and with α = 0.01?

c) Linear regression learns a linear function of the input values x. Explain two ways we could learn

a non-linear function of the input values.

4. Consider the classification problem given by the training data below. In this problem we are trying to

classify images of dogs while considering only three binary features: ears, fur, and hat. The column y

gives the labels: 0 for an image without a dob and 1 for an image with a dog. For example, the first

image contains no ears, fur, nor hat and it is labelled as ‘not dog,’ while the last image in the data set

contains ears, fur, and hat and it is labelled as ‘dog’. Considering the training set below, answer the

following questions.

ears fur hat y

0 0 0 0

1 0 0 0

1 1 0 0

0 1 1 0

1 0 1 1

1 1 1 1

a) Is this problem linearly separable? Justify your answer.

b) What if we modify the training so that [1, 1, 0] has the label of 1 instead of 0 and [1, 1, 1] has the

label of 0 instead of 1, is the problem linearly separable?

c) How many hidden layers does a neural network need for solving the problem given by the data

set above where the instance [1, 1, 0] has the label of 1 instead of 0 and [1, 1, 1] has the label of 0

instead of 1.

5. Consider the XOR problem shown in the table below. One of your classmates is trying to train a

neural network to learn the mapping of x0 and x1 values to the labels in the training data.

Your classmate told you that their model is unable to learn the correct mapping from the input values

to the output values. Suspecting they were using a model with no hidden layers, you asked them to

2

x0 x1 ¬(x0 ⊕ x1)

0 0 1

0 1 0

1 0 0

1 1 1

draw the model they have implemented. Your classmate then draws the following model on the back of

a napkin and they mention that they initialize all weights of the model with the value of zero. Suddenly

you understand why their model is unable to learn the XOR function. Explain to your classmate why

the weights can’t be initialized with the same value.

+1

x0

x1

+1

a0

a1

hw

6. Jorge, the capybara, needs your help to compute the Q(s, a) values for the problem below. Jorge starts

in position (1, 1) (see the figure below). The states with a dashed squared and with the letter G have

only the action “Out” available. All other states have the actions up (U), down (D), left (L), and right

(R) available. Once Jorge applies the “Out” action it goes to a terminal state in the Markov decision

process.

The table shows the experiences Jorge has collected by following an -greedy policy. Assuming that

Q(s, a) = 0 for all s and a, γ = 1 and α = 0.5, what are the values of Q(s, a) after processing the

experience shown in the table below with Q-Learning.

7. Sarsa is an on-policy algorithm that employs an -greedy policy to gather experience by interacting with

the environment. Sarsa then learns an optimal -greedy policy by interacting with the environment.

If the value of  decreases over time and eventually converges to  = 0, then Sarsa is able to learn

3

Episode 1 Episode 2 Episode 3 Episode 4

(1, 1), D, (1, 2), 0 (1, 1), D, (1, 2), 0 (1, 1), D, (1, 2), 0 (1, 1), D, (1, 2), 0

(1, 2), R, (2, 2), 0 (1, 2), R, (2, 2), 0 (1, 2), R, (2, 2), 0 (1, 2), R, (2, 2), 0

(2, 2), U, (2, 1), 0 (2, 2), R, (3, 2), 0 (2, 2), U, (2, 1), 0 (2, 2), R, (3, 2), 0

(2, 1), Out, Terminal, -10 (3, 2), R, (4, 2), 0 (2, 1), Out, Terminal, -10 (3, 2), R, (4, 2), 0

(4, 2), U, (4, 1), 0 (4, 2), U, (4, 1), 0

(4, 1), Out, Terminal, +50 (4, 1), Out, Terminal, +50

an optimal policy for the problem. Q-Learning also uses an -greedy policy to gather experience by

interacting with the environment. However, instead of learning an -greedy policy, Q-Learning learns

a greedy policy directly. What are the negative effects if Q-Learning also employed a greedy policy to

drive its behavior while interacting with the environment?

4


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp