#### 联系方式

• QQ：99515681
• 邮箱：99515681@qq.com
• 工作时间：8:00-23:00
• 微信：codinghelp

#### 您当前位置：首页 >> Java编程Java编程

###### 日期：2020-10-26 10:49

Coursework 1

Mathematics for Machine Learning (70015)

This coursework has both writing and coding components. The python code you

submit must compile on a standard CSG Linux installation.

You are not permitted to use any symbolic manipulation libraries (e.g. sympy) or

automatic differentiation tools (e.g. tensorflow) for your submitted code (though,

checked for imports. Note that if you use python you should not need to import

anything other than numpy for the submitted code for this assignment.

The writing assignment requires plots, which you can create using any method of

your choice. You should not submit the code used to create these plots.

No aspect of your submission may be hand-drawn. You are strongly encouraged to

use LATEXto create the written component.

In summary, you are required to submit a zip-file named cw2.zip containing the

following:

? A file coding_answers.py which implements all the methods for the coding

exercises.

1

1 Differentiation

In this question, we define the following constants:

We define also the following functions, which are all R

a) [3 marks] Write f1(x) in the completed square form (x ? c)

T C(x ? c) + c0, i.e.,

determine C, c, c0.

b) [2 marks] Find the Hessian of f1. Explain what condition must hold for f1 to

have a minimum point. State the minimum value of f1 and find the input which

achieves this minimum.

that return the gradient for each of the functions above.

All functions must accept numpy (2,) array inputs and return numpy (2,) outputs.

50 iterations to find a local minimum for both f2 and f3. Show the steps of

your algorithm on a contour plot of the function. Start from the point (0.3, 0) and

state the step size you used. Produce separate contour plots for the two functions,

using first component of x on the x axis and the second on the y.

e) [5 marks] For the two functions f2 and f3:

? Discuss the qualitative differences you observe when performing gradient

descent with step sizes varying between 0.01 and 1, again starting the point

(0.3, 0).

? Briefly describe also what happens in the two cases with grossly mis-specified

step-sizes (i.e. greater than 1).

2