联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2022-11-27 10:27

Homework 7 for Math 173A - Fall 2022

1. Indicate whether the following functions are strongly convex. Justify your answer.

(a) f(x) = x

(b) f(x) = x2

(c) f(x) = log(1 + ex)

2. Let f : Rn → R be a differentiable function. Prove the following two conditions

are equivalent (i.e. each one implies the other)

(a) f(y) ≥ f(x) +?f(x)T (y ? x) + c2‖x? y‖2 for all x, y ∈ Rn

(b) The function g(x) := f(x)? c2‖x‖2 is convex.

Hint: Recall that g(x) is convex if and only if g(y) ≥ g(x) +?g(x)T (y ? x)

3. Here we will prove the inequality used in class to prove fast convergence for strongly

convex functions.

Let F (x) be a strongly convex function with constant c. Our goal is to show

F (x)? F (x?) ≤ 1

2c

‖?F (x)‖2 for all x ∈ Rd. (1)

(a) Fix x ∈ Rd and define the quadratic function

q(y) = F (x) +?F (x)T (y ? x) + c

2

‖x? y‖2.

Find the y? that minimizes q(y).

(b) Show that q(y?) = F (x)? 12c‖?F (x)‖2

(c) Use the above to deduce (1).

(d) Explain the proof technique in your own words to demonstrate understanding

of what we did.

4. Let A be a diagonal matrix with diagonal entries Aii = i (i.e. the entries run from

1 to n), and let f(x) = 12x

TAx+ bTx+ c. We plan to run gradient descent on this

function, and want to understand in theory how fast this will converge.

(a) Show which assumptions are satisfied by f . (i.e. bounded Hessian, strong

convexity).

1

(b) What should we pick as a step size?

(c) What can we say about F (x(t))? F (x?) and it’s rate of decay?

5. We will implement the SVM algorithm with gradient descent to classify two gaus-

sians in 2D.

(a) Sample 100 points from one Gaussian with mean (?3,?3) and one Gaussian

with mean (3, 3). Both Gaussians have identity covariance. You can do this by

calling randn (Matlab) or np.random.randn (Python) to generate each point

cloud, and then add or subtract 3 from each coordinate. Create a vector Y

of the labels that’s 1’s on one class and -1’s on the other class. Create and

turn in a scatter plot of the data points colored by label to ensure you have

correctly created your data.

(b) Create a function for the gradient of the loss

L(w) =

1

2

‖w‖2 +

n∑

i=1

max(0, 1? yi〈xi, w〉)

?L(w) = w +

n∑

i=1

?yixi · 11?yi〈xi,w〉>0,

where 11?yi〈xi,w〉>0 =

{

1 if 1? yi〈xi, w〉 > 0

0 else

. Also, here n = 200. To

compute the gradient, you’ll have to compute an indicator of whether 1 ?

yi〈xi, w〉 is positive or negative at every point, and sum up the contribution

of this term for all points where it’s positive.

(c) Setting the step size μ = 10?4 and starting at w(0) = (1, 1), run 1000 iterations

of gradient descent. You will create twp plots.

i. Plot the classification error (averaged over all the points) as a function

of the iterations. The classification of xi is determined by sign(〈xi, w〉).

ii. Plot the margin 2‖w‖ as a function of the iterations. This shows how much

of a gap you have between the classes you’ve learned.

(d) Create another scatter plot of your data, but this time color the points by the

function f(xi) = 1?yi ·〈xi, w〉. The numbers closest to 0 (positive numbers or

largest negative numbers) will show you which points were “most important”

in determining the classification.

Note: Here we’re only defining a subspace classifier (i.e. the classifier goes through

the origin). This is fine for our problem as the gaussians are on opposite sides of

the origin. If you want to create an intercept term, simply append a vector of all

1’s as a column of your data, and now your weight vector will be of dimension 3

instead of 2. This is done the same way as when running least squares regression.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp