联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2018-12-11 10:35

CS 540 Fall 2018

CS 540: Introduction to Artificial Intelligence

Homework Assignment # 10

Assigned: 12/4

Due: 12/11 before class

Question 1: Neural Network [100 points, evenly among parts]

We will build a neural network to perform binary classification. Each input item has two real-valued features

x = (x1, x2), and the class label y is either 0 or 1. Our neural network has a very simple structure:

ReLU A

C Sigmoid

B ReLU

x1 x2

1 1

1

w8 w9

w2

w5 w3

w6

w1 w4

w7

There is one hidden layer with two hidden units A, B, and one output layer with a single output unit C.

The input layer is fully connected to the hidden layer, and the hidden layer is fully connected to the output

layer. Each unit also has a constant bias 1 input with the corresponding weight. Units A are B are ReLU,

namely

fA(z) = fB(z) = max(z, 0).

Unit C is a sigmoid:

fC (z) = σ(z) = 1

1 + e?z

.

Write a program Neural.java with the following command line format:

CS 540 Fall 2018

$java Neural FLAG [args]

Where the optional arguments are real valued.

1. This neural network is fully defined by the nine weights w1, . . . , w9. The first question first focuses on

predictions given fixed weights.

Remark: we will use both single index and double index to refer to a weight. Single index corresponds

to the figure above. Double index, on the other hand, is used to describe the algorithm and denotes

the “from → to” nodes that the edge is connecting. For example, w8 is the same as wA,C , w2 is the

same as wx1,A, and w1 is the same as w1,A where we used “1” to denote the constant bias input of

one. These should be clear from the context.

Recall in a neural network for any unit j, it first collects input from lower units:

uj =

X

i:i→j

wijvi

where vi

is the output of lower unit i. Specifically, if i is an input unit then vi = xi

; if i is the bias

then vi = 1. The unit j then passes uj through its nonlinear function fj () to produce its output vj :

vj = fj (uj ).

When FLAG=100, arg1 ... arg9 are the weights w1, . . . , w9, and arg10=x1, arg11=x2. Print uA, vA, uB, vB, uC , vC

on the same line separated by space. When printing real values, show 5 digits after decimal point by

rounding (but do not round the actual variables). For example,

$java Neural 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -1

0.00000 0.00000 0.30000 0.30000 0.97000 0.72512

$java Neural 100 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 -0.2 1.7

2.18000 2.18000 1.43000 1.43000 1.34000 0.79249

$java Neural 100 4 3 2 1 0 -1 -2 -3 -4 -4 1

-6.00000 0.00000 0.00000 0.00000 -2.00000 0.11920

2. Given a training item x = (x1, x2) and its label y, the error made by the neural network on the item

is defined as

E =

1

2

(vC y)

2

.

The partial derivative with respect to the output layer variable vC is

E

vC

= vC y.

The partial derivative with respect to the intermediate variable uC is

E

uC

=

E

vC

f

0

C (uC ).

CS 540 Fall 2018

Recall f

0

C (uC ) = σ

0

(uC ) = σ(uC )(1 σ(uC )) = vC (1 vC ).

When FLAG=200, arg1 ... arg9 are the weights w1, . . . , w9, arg10=x1, arg11=x2, and arg12=y. Print

E,

E

vC

, and E

uC

on the same line separated by space. For example,

$java Neural 200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -1 1

0.03778 -0.27488 -0.05479

$java Neural 200 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 -0.2 1.7 0

0.31402 0.79249 0.13032

$java Neural 200 4 3 2 1 0 -1 -2 -3 -4 -4 1 0

0.00710 0.11920 0.01252

3. The partial derivative with respect to hidden layer variable vj is

E

vj

=

X

k:j→k

wjk

Euk.

And

Euj=E

vj

vj

uj.

Recall our hidden layer units are ReLU, for which

vj

uj=

max(uj , 0)

uj

=



1, uj ≥ 0

0, uj < 0

Note we define the derivative to be 1 when uj = 0 (look up subderivative to learn more).

When FLAG=300, arg1 ... arg9 are the weights w1, . . . , w9, arg10=x1, arg11=x2, and arg12=y. Print

vA

E

uA,

E

vB

, and E

uB

on the same line separated by space.

$java Neural 300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -1 1

-0.04383 -0.04383 -0.04931 -0.04931

$java Neural 300 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 -0.2 1.7 0

0.03910 0.03910 0.02606 0.02606

$java Neural 300 4 3 2 1 0 -1 -2 -3 -4 -4 1 0

-0.03755 0.00000 -0.05006 -0.05006

4. Now we can compute the partial derivative with respect to the edge weights:

E

wij

= vi

E

uj

.

When FLAG=400, arg1 ... arg9 are the weights w1, . . . , w9, arg10=x1, arg11=x2, and arg12=y. Print

E

w1

, . . .

E

w9

on the same line separated by space. For example,

CS 540 Fall 2018

$java Neural 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -1 1

-0.04383 -0.04383 0.04383 -0.04931 -0.04931 0.04931 -0.05479 0.00000 -0.01644

$java Neural 400 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 -0.2 1.7 0

0.03910 -0.00782 0.06647 0.02606 -0.00521 0.04431 0.13032 0.28411 0.18636

$java Neural 400 4 3 2 1 0 -1 -2 -3 -4 -4 1 0

0.00000 0.00000 0.00000 -0.05006 0.20025 -0.05006 0.01252 0.00000 0.00000

5. Now we perform one step of stochastic gradient descent. With step size η, we update the weights:

wi = wi η

E

wi

, i = 1 . . . 9.

When FLAG=500, arg1 ... arg9 are the weights w1, . . . , w9, arg10=x1, arg11=x2, arg12=y, and

arg13=η. Print four lines:

(a) the old w1 . . . w9

(b) the error E under the old w

(c) the updated w1 . . . w9

(d) the error E after the update

For example,

$java Neural 500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -1 1 0.1

0.10000 0.20000 0.30000 0.40000 0.50000 0.60000 0.70000 0.80000 0.90000

0.03778

0.10438 0.20438 0.29562 0.40493 0.50493 0.59507 0.70548 0.80000 0.90164

0.03617

$java Neural 500 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 -0.2 1.7 0 0.1

1.00000 0.90000 0.80000 0.70000 0.60000 0.50000 0.40000 0.30000 0.20000

0.31402

0.99609 0.90078 0.79335 0.69739 0.60052 0.49557 0.38697 0.27159 0.18136

0.29972

$java Neural 500 4 3 2 1 0 -1 -2 -3 -4 -4 1 0 0.1

4.00000 3.00000 2.00000 1.00000 0.00000 -1.00000 -2.00000 -3.00000 -4.00000

0.00710

4.00000 3.00000 2.00000 1.00501 -0.02002 -0.99499 -2.00125 -3.00000 -4.00000

0.00371

6. Long long ago, in a galaxy far far away, 117 students took an AI class. We have a data set where x1

is each student’s score on homework 2, x2 is their midterm score (both scaled to between 0 and 1),

and the binary label y is whether they received an A at the end of semester (y = 1) or not (y = 0).

We already randomly split the data set into a training set of n = 67 items, an evaluation set with 25

CS 540 Fall 2018

items, and a test set with 25 items. These files are denoted by the file names. We will train the neural

network to predict if a student will receive an A at the end of semester.

We first define one “epoch” as going over the training items once in the natural order:

(a) FOR j = 1 TO n

(b) wi = wi η

Ej

wi

, i = 1 . . . 9

(c) PRINT w’s evaluation set error

(d) END

where Ej is the error on the j-th training item. In other words, this is stochastic gradient descent

(except that we go over training items in order rather than randomly, but we shall ignore the difference

for now). Step (c) computes the evaluation set error as follows:

X

25

k=1

1

2

(vC (xk, w) yk)

2

where (xk, yk) is an evaluation set item, and we used the notation vC (xk, w) to mean the neural network

output when the input is xk and the weights are w.

When FLAG=600, arg1 ... arg9 are the initial weights w1, . . . , w9, arg10=η. Run one epoch (i.e. going

over the 67 training items once). For each training item, print three lines:

(a) x1, x2, y

(b) the updated w1 . . . w9

(c) the evaluation set error under the updated w

Therefore, you will print 67 ? 3 = 201 lines in total. For example,

$java Neural 600 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1

0.98000 0.87000 1.00000

0.10049 0.20048 0.30043 0.40055 0.50054 0.60048 0.70062 0.80034 0.90087

7.18915

0.92000 0.88000 0.00000

0.09486 0.19530 0.29548 0.39422 0.49471 0.59491 0.69358 0.79648 0.89110

7.12401

...

0.81000 0.75000 0.00000

-0.13841 0.01138 0.16820 0.06903 0.24590 0.41273 0.22295 0.73651 0.52301

3.93300

0.94000 0.69000 0.00000

-0.13841 0.01138 0.16820 0.06135 0.23868 0.40744 0.20827 0.73651 0.51442

3.87761

7. We now run T epochs. For clarity, we will only print evaluation set error at the end of each epoch.

When FLAG=700, arg1 ... arg9 are the initial weights w1, . . . , w9, arg10=η, arg11=T. At the end of

each epoch, print w, then on a separate line the evaluation set error.

CS 540 Fall 2018

$java Neural 700 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 3

-0.13841 0.01138 0.16820 0.06135 0.23868 0.40744 0.20827 0.73651 0.51442

3.87761

-0.15422 0.01050 0.18620 -0.10760 0.12065 0.33611 -0.14930 0.73903 0.41698

2.99174

-0.18275 0.01188 0.22380 -0.19706 0.06861 0.31792 -0.35328 0.74892 0.41936

2.71163

8. FLAG=800 has the same input as FLAG=700. But we will stop as soon as evaluation set error starts

to increase after completing a whole epoch. If evaluation set error decreases or stays the same, we

will stop after T epochs. Print the number of epochs that actually happened, the final w, and the

evaluation set error on different lines. Now you have a well-trained neural network. For a test item

x, you can make a binary classification prediction by thresholding the network output at 1/2 (predict

positive class if the output is exactly 1/2). Compute the test set classification accuracy (note: not the

squared error) and print it, too.

$java Neural 800 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 10000

788

-3.51459 1.20379 4.45346 -2.72310 0.93258 3.44859 -3.99435 2.11219 1.63381

0.87178

0.96000

$java Neural 800 -0.8 0.5 0 -1 0 -0.7 -0.7 0.9 0.1 0.1 10000

2

-0.80000 0.50000 0.00000 -1.00000 0.00000 -0.70000 -0.65318 0.90000 0.10000

2.56847

0.72000


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp