Homework # 6
1. Let f(x) be a function from R
n
to R. Suppose we would like to
maximize f(x). Show that if Hf(x) is negative definite then
the Newton’s method direction at x, [Hf(x)]1f(x), is an
ascent direction. What does this imply for Newton’s method
with backtracking? (We did this in class, except that we considered
the minimization case and Hf(x) as positive definite;
here I want you to go through the argument yourself for this
slightly altered case.)
2. Consider the log-likelihood for the single covariate (i.e. each
xi ∈ R) logistic regression:
log L(α) = XNi=1
(1 yi)(α0 α1xi) log(1 + exp(α0 α1xi))
(1)
(a) Let g(α) = log(1+exp(α0α1xi)). To make the notation
simpler, set xi as follows,(2)
and explain why we can rewrite g(x) in the more compact
form: g(α) = log(1 + exp(α · xi)). (This compact form
makes it easier to take derivatives.
(b) Show the following
(4)
(You computed the gradient and Hessian of g(α) in previous
hws, but here I want you to see the form above so the
next subproblem is easier.)
1
(c) Let v ∈ R
n
. Thinking of v as a column vector, define the
matrix A = vvT
. Show that A is positive semidefinite,
meaning that x
TAx ≥ 0 for all x ∈ R
n
. (Hint: Consider
(xTv)(vT x)). Use this fact to show that the function g(x)
is convex.
(d) Show the following facts. You can prove them from the
definition or just explain the intuition through a graph.
i. If two functions f(x) and h(x) are convex then so is
their sum f(x) + g(x).
ii. If a function f(x) is convex, then f(x) is concave.
Then, show that log L(α) is a concave function.
(e) Generate a plot of log L(α) over some line in R
2
that contains
the maximum of log L(α) (you computed this point
in a previous hw.). Explain why the graph you produce is
concave.
3. The MNIST dataset is a popular dataset for practicing machine
learning algorithms. Read about the dataset here
https://en.wikipedia.org/wiki/MNIST_database
Attached you will find two files. mnist_train.csv, mnist_test.csv.
Each file contains a matrix. Each row of the matrix corresponds
to an image of a hand written digit. The first entry in the row
is the digit in the image (i.e. 7 if the digit image is a seven), the
rest of the values, of which there are 784 (from a 28 × 28 pixel
image) are the pixel values. See the script mnist_intro.R for
an example.
In this problem, you will build a classifier that identifies when
a hand written digit equals 3. To build the classifier you will
fit a logistic regression to the data. The response variable,
y ∈ {0, 1} will be 1 if the number is 3 and 0 otherwise. The
covariates, x ∈ R
784 are the pixel values. Set
α = (α0, α1, α2, . . . , α784) (5)
The logistic-regression model assumes
P(y 1 | x, α) = 1
1 + exp[α · x]
(6)
2
where x is defined as in problem 2 (i.e. we just add a 1 to the
beginning of the x vector.)
(a) Show that the log-likelihood is given by
log L(α) = XNi=1
(1yi)(α· xi)log(1 + exp(α· xi)) (7)
(b) Set g(α) = log(1 + exp(α · xi)) and show that g(x) and
Hg(x) have the same form given in problem 2
(c) Write a damped Newton’s method algorithm to compute
the optimal α by maximizing the log likelihood on the
training dataset. How do you know Newton’s method will
converge to a maximum? (NOTE: You may run into dif-
ficulties with non-invertible Hessians; we will address that
issue in coming classes. If that happens, try other starting
points.)
(d) Recall that a classifier is a function
C(x) : R
784 → {0, 1}. (8)
Once you compute an α in (c), you can build a classifier
as follows
C(x) = 1 if P(x | α) = 1
1+exp[α·x] ≥ p
0 otherwise.
(9)
where p is some cutoff probability. Above p, you call the
images as 3’s, below p you call images as not 3’s. Select
a p and test the accuracy of your classifier using the test
dataset.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。