联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2018-11-29 11:05

CS 540 Fall 2018

CS 540: Introduction to Artificial Intelligence

Homework Assignment # 9

Assigned: 11/20

Due: 12/4 before class

Question 1: Lake Mendota Ice [100 points]

Can you predict how harsh this Wisconsin winter will be?

The Wisconsin State Climatology Office keeps a record on the number of days Lake Mendota was covered

by ice at http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.html.

Write a program Ice.java with the following command line format:

$java Ice FLAG [arg1 arg2]

Where the two optional arguments are real valued.

Questions 5 is 20 points; other questions 10 points each.

1. As with any real problems, the data is not as clean or as organized as one would like for machine

learning. Curate a clean data set starting from 1855-56 and ending in 2017-18. Let x be the year: for

1855-56, x = 1855; for 2017-18, x = 2017; and so on. Let y be the ice days in that year: for 1855-56,

y = 118; for 2017-18, y = 94; and so on. Some years have multiple freeze thaw cycles such as 2001-02,

that one should be x = 2001, y = 21. Although we do not ask you to hand in any visualization code

or figures, we strongly advise you to plot the data (using any plotting tool inside or outside Java) and

see what it is like.

For simplicity, hard code the data set in your program. When FLAG=100, print out the data set. One

year per line with the x value first, a space, then the y value. For example,

$java Ice 100

1855 118

...

2001 21

...

2017 94

2. When FLAG=200, print n the number of data points, the sample mean ˉy =

1

n

Pn

i=1 yi

, and the sample

standard deviation q 1

n1

Pn

i=1(yi yˉ)

2, on three lines. For real values in this homework, keep two

digits after decimal point. For example (for this example the numbers are made-up):

$java Ice 200

321

123.45

32.10

CS 540 Fall 2018

3. We will perform linear regression with the model

f(x) = β0 + β1x.

We first define the mean squared error as a function of β0, β1:

MSE(β0, β1) = 1

n

Xn

i=1

(β0 + β1xi ? yi)

2

.

When FLAG=300, arg1=β0 and arg2=β1. Print the corresponding MSE.

$java Ice 300 300.00 -0.10

331.15

$java Ice 300 0 0

10885.20

$java Ice 300 100.00 0

384.59

$java Ice 300 400 0.1

241661.16

$java Ice 300 200 -0.2

84225.76

4. We perform gradient descent on MSE. At current parameter (β0, β1), the gradient is defined by the

vector of partial derivatives

?MSE(β0, β1)

?β0

=

2

n

Xn

i=1

(β0 + β1xi yi) (1)

MSE(β0, β1)

β1

=

2

n

Xn

i=1

(β0 + β1xi yi)xi

. (2)

When FLAG=400, arg1=β0 and arg2=β1. Print the corresponding gradient as two numbers on separate

lines.

$java Ice 400 0 0

-205.01

-396046.91

$java Ice 400 100 0

-5.01

-8846.91

$java Ice 400 300 -0.1

7.79

15491.09

$java Ice 400 400 0.1

982.19

1902815.09

CS 540 Fall 2018

$java Ice 400 200 -0.2

-579.41

-1121770.91

5. Gradient descent starts from initial parameter (β

(0)0, β(0)1), and iterates the following updates at time

t = 1, 2, . . . , T:

(4)

When FLAG=500, arg1=η and arg2=T. Start from initial parameter (β

(0)

0

, β(0)

1

) = (0, 0). Perform

T iterations of gradient descent. Print the following in each iteration on a line, separated by space:

t, β(t)

0

, β(t)

1

, MSE(β

(t)

0

, β(t)

1

). For example,

$java Ice 500 1e-7 5

1 0.00 0.04 1082.37

2 0.00 0.05 469.99

3 0.00 0.05 431.74

4 0.00 0.05 429.35

5 0.00 0.05 429.20

$java Ice 500 1e-8 5

1 0.00 0.00 9375.50

2 0.00 0.01 8083.77

3 0.00 0.01 6978.55

4 0.00 0.01 6032.91

5 0.00 0.02 5223.81

$java Ice 500 1e-9 5

1 0.00 0.00 10728.94

2 0.00 0.00 10575.01

3 0.00 0.00 10423.38

4 0.00 0.00 10274.02

5 0.00 0.00 10126.89

$java Ice 500 1e-6 5

1 0.00 0.40 442280.27

2 -0.00 -2.18 18672210.33

3 0.01 14.56 789034169.47

4 -0.05 -94.24 33343056376.02

5 0.32 613.00 1409013738521.32

Note with η = 1e ? 6 gradient descent is diverging.

CS 540 Fall 2018

It is not required for hand in, but try different initial parameters, η, and much larger T and see how

small you can make MSE.

6. Instead of using gradient descent, we can compute the closed-form solution for the parameters directly.

For ordinary least squared in 1D, this is

β

1 =

P(xi xˉ)(yiyˉ)

P(xixˉ)

2β0 = ˉyβ

1x, ˉ

where ˉx = (Pxi)/n and ˉy = (Pyi)/n. Required: When FLAG=600, print β

0, β

1 (note the order),

and the corresponding MSE on a single line separated by space.

It is not required for hand in, but give it some thought: what does a negative β

1 mean

7. Use β

0, β

1 you can predict the number of ice days for a future year. Required: When FLAG=700,

arg1=year. Print a single real number which is the predicted ice days for that year. For example,

$java Ice 700 2050

80.75

It is not required for hand in, but give it some thought:

What’s the prediction for this winter?

Which year will the predicted ice day become negative?

What does that say about the model?

8. If you are agonizing over your inability to get gradient descent to match the closed-form solution in

question 5, you are not alone. The culprit is the scale of input x compared to the scale of the implicit

offset value 1 (think β0 = β0 · 1). Gradient descent converges slowly when these scales differ greatly,

a situation known as bad condition number in optimization. When FLAG=800, we ask you to first

normalize the input x (note: not the labels y):

xˉ =

1

n

Xn

i=1

xi (5)

stdx =

vuut

1

n 1

Xn

i=1

(xi xˉ)

2 (6)

xi ← (xi xˉ)/stdx. (7)

Then proceed exactly as when FLAG=500. The two arguments are again arg1=η and arg2=T. For

example,

$java Ice 800 1 5

1 205.01 -17.90 10883.24

2 -0.00 -0.22 10881.32

CS 540 Fall 2018

3 205.01 -17.69 10879.45

4 -0.00 -0.43 10877.62

5 205.01 -17.47 10875.84

$java Ice 800 0.1 5

1 20.50 -1.79 7073.86

2 36.90 -3.22 4634.55

3 50.02 -4.37 3073.35

4 60.52 -5.29 2074.16

5 68.91 -6.03 1434.66

$java Ice 800 0.01 5

1 2.05 -0.18 10465.96

2 4.06 -0.35 10063.31

3 6.03 -0.53 9676.61

4 7.96 -0.70 9305.22

5 9.85 -0.86 8948.54

With η = 0.1 you should get convergence within 100 iterations.

(Note the β’s are now for the normalized version of x, but you can easily translate them back for the

original x with a little algebra. This is not required for the homework.)

9. Now we implement Stochastic Gradient Descent (SGD). With everything the same as part 8 (including

x normalization), we modify the definition of gradient in equations (1) and (2) as follows. In iteration

t we randomly pick one of the n items. Say we picked (xjt

, yjt

). We approximate the gradient using

that item only:

MSE(β0, β1)

β0

≈ 2(β0 + β1xjtyjt

) (8)

MSE(β0, β1)

β1

≈ 2(β0 + β1xjt yjt

)xjt

. (9)

When FLAG=900, print the same information as in part 8. For example (your results will differ

because of randomness in the items picked):

$java Ice 900 0.1 5

1 17.80 24.89 8614.29

2 45.98 -12.13 3502.30

3 56.93 -12.36 2385.58

4 67.55 -16.63 1577.17

5 76.76 -17.41 1030.71

With η = 0.1 you should approximately converge within a few hundred iterations.

Since n is small in our data set, there is little advantage of SGD over gradient descent. However, on

large data sets SGD becomes more desirable.

CS 540 Fall 2018

Hints:

1. Update β0, β1 simultaneously in an iteration. Don’t use a new β0 to calculate β1.

2. Use double instead of float in Java.

3. Don’t round the variables themselves to 2 digits in the middle stages. That is for printing only


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp