联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2023-10-15 10:57

COMPSCI 4ML3, Introduction to Machine Learning

Assignment 1, Fall 2023

Hassan Ashtiani, McMaster University

Due date: Thursday, October 5th, 11pm

Notes. Type your solutions in Latex and upload a single pdf file that includes all your answers

in Avenue. Use Teams to ask/answer questions.

Review (Linear Algebra). A set of k d-dimensional vectors v1, v2, ..., vk ∈ R

d are linearly

dependent if there exists a1, a2, ..., ak ∈ R such that at least one of ai

’s is non-zero and Pk

i=1 aivi = ⃗0.

Also, a set of vectors are linearly independent if they are not linearly dependent.

Furthermore, the column rank of a matrix (i.e., the number of linearly independent column vectors)

is equal to its row rank (i.e., the number of linearly independent row vectors) — in fact, this is why

we can call it just the “rank” of the matrix. A k-by-k square matrix is invertible if and only if it is

full rank (i.e., its rank is k). Also, k-by-k matrix A is said to be positive definite if for every u ∈ R

d

,

u

T Au > 0. All positive definite matrices are invertible.

Review (Ordinary Least Squares). In the ordinary least squares problem we are given n data

points {(x

i

, yi

)}

n

i=1 where each x

i ∈ R

d and each y

i ∈ R, and the goal is fitting a line/hyperplane

(represented by a d-dimensional vector W) with the minimum sum of squared errors:

min

W

Xn

i=1

((x

i

)

T W − y

i

)

2 = min

W

∥XW − Y ∥

2

2

where in the matrix form (on the right side) X is an n-by-d matrix, and Y is an n-dimensional

vector.

1. Consider the least-squares setting discussed above and assume (XT X) is invertible. In each of

the following cases, either prove that (Z

TZ) is necessarily invertible, or give a counter example.

(a) [5 points] Z = XT X + 0.1Id×d, where Id×d is a d-by-d identity matrix.

(b) [5 points] We add an arbitrary new column u ∈ R

d

(i.e., a new feature) to X and call it Z

(so Z = [X|u] is an n-by-d + 1 matrix).

(c) [5 points] We add an arbitrary new row v ∈ R

n (i.e., a new data point) to X and call it Z

(so Z = [XT

|v]

T

is an n + 1-by-d matrix).

2. Consider the least squares problem.

(a) [5 points] Assume Rank(X) = n = d and let WLS be the solution of the least squares.

Show XT

(Y − XWLS) = 0.

(b) [5 points] Assume Rank(X) = n < d, and let W be “one of the solutions” of the least

squares minimization problem. Can we always say XT

(Y − XW) = 0? Why?

(c) [5 points] Assume Rank(X) = n = d. Prove that ∥XWLS − Y ∥

2

2 = 0.

(d) [10 points] Assume Rank(X) = n < d. Can we say that minW ∥XW − Y ∥

2

2 = 0? Prove

your answer.

1

3. [20 points] In this question we will use least squares to find the best line (ˆy = ax + b) that fits

a non-linear function, namely f(x) = 2x − 5x

3 + 1. For this, assume that you are given a set

of n training points {(x

i

, yi

)}

n

i=1={((i/n), 2(i/n) − 5(i/n)

3

) + 1}

n

i=1. Find a line (i.e., a, b ∈ R)

that fits the training data the best when n → ∞. Write down your calculations as well as the

final values for a and b. (Additional notes: the n → ∞ assumption basically means that we are

dealing with an integral rather than a finite summation. If it makes it easier for you, instead of

working with an actual training data you can assume x is uniformly distributed on [0, 1].)

4. [20 points] This question is similar to the previous one, except that you are allowed to use a

program to find the final answer. Assume the input is three dimensional (x1, x2, x3), and the

target function is f(x1, x2, x3) = x1 + 3x2 + 4x3 + 5x1x2 − 5x2x3 + x

2

1x

2

3 + (x1 + x2)

x3

. Find

a, b, c, d ∈ R such that the hyperplane ˆy = ax1 + bx2 + cx3 + d fits the data the best when x is

uniformly distributed in [1, 2]3

(the least squares solution). Report the values of a, b, c, d, and

include your code in the pdf file for the solutions. You can use the python OLS script that is

provided in Avenue as a starting point.

5. [20 points] In this question we would like to fit a line with zero y-intercept (ˆy = ax) to the

curve y = x

2

. However, instead of minimizing the sum of squares of errors, we want to minimize

the following objective function:

X

i



log 

i

y

i

2

Assume that the distribution of x is uniform on [1, 2]. What is the optimal value for a? Show

your work.

2


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp