Coursework 1

Mathematics for Machine Learning (70015)

This coursework has both writing and coding components. The python code you

submit must compile on a standard CSG Linux installation.

You are not permitted to use any symbolic manipulation libraries (e.g. sympy) or

automatic differentiation tools (e.g. tensorflow) for your submitted code (though,

of course, you may find these useful for checking your answers). Your code will be

checked for imports. Note that if you use python you should not need to import

anything other than numpy for the submitted code for this assignment.

The writing assignment requires plots, which you can create using any method of

your choice. You should not submit the code used to create these plots.

No aspect of your submission may be hand-drawn. You are strongly encouraged to

use LATEXto create the written component.

In summary, you are required to submit a zip-file named cw2.zip containing the

following:

? A file write_up.pdf for your written answers.

? A file coding_answers.py which implements all the methods for the coding

exercises.

1

1 Differentiation

In this question, we define the following constants:

We define also the following functions, which are all R

a) [3 marks] Write f1(x) in the completed square form (x ? c)

T C(x ? c) + c0, i.e.,

determine C, c, c0.

b) [2 marks] Find the Hessian of f1. Explain what condition must hold for f1 to

have a minimum point. State the minimum value of f1 and find the input which

achieves this minimum.

c) [6 marks] Write three python functions grad_f1(x), grad_f2(x) and grad_f3(x)

that return the gradient for each of the functions above.

All functions must accept numpy (2,) array inputs and return numpy (2,) outputs.

d) [4 marks] Use your gradients to implement a gradient descent algorithm with

50 iterations to find a local minimum for both f2 and f3. Show the steps of

your algorithm on a contour plot of the function. Start from the point (0.3, 0) and

state the step size you used. Produce separate contour plots for the two functions,

using first component of x on the x axis and the second on the y.

e) [5 marks] For the two functions f2 and f3:

? Discuss the qualitative differences you observe when performing gradient

descent with step sizes varying between 0.01 and 1, again starting the point

(0.3, 0).

? Briefly describe also what happens in the two cases with grossly mis-specified

step-sizes (i.e. greater than 1).

2

版权所有：编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。