代写python编程、program程序代写-代写Python编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

您当前位置：首页 >> Python编程Python编程

代写python编程、program程序代写

日期：2021-10-01 10:57

Instructions: Submit two files. One should be a write-up of all solutions and observations, as LastnameFirstnameSolution.pdf.

The second should be an archive LastnameFirstnameCode.zip containing code and any results

files.

1 [5 pts.] Irreducible data example

In class we discussed that not all datasets’ dimensionality can be successfully reduced using PCA.

(a)[1 pts] Discuss the cases when PCA will fail.

(b)[2 pts] How do we quantify that it fails?

(c)[2 pts] Provide an example dataset of 2D points (specify the points as vectors of numbers) in which PCA will

not work well for dimensionality reduction. Explain why. Hint: Think of 2D points and reduction to 1D.

2 [40 pts] Dimensionality reduction

For this question you can use the cloud.data.pdf from the UCI ML repository: https://archive.ics.uci.edu/ml/

datasets/Cloud. Read about it to get familiar with what is measured. Within the data, there are two datasets: DB

#1 and DB #2. For this homework, just use the 1024 vectors in DB #1. Use python for all your programming. You

will have to submit your code in LastnameFirstnameCode.zip together with the relevant write-up in the main

solution file LastnameFirstnameSolution.pdf.

(a)[5 pts] Load the data into a python program and center it. Note: there should be a function called center()

in your code that achieves this.

(b)[5 pts] Compute the covariance matrix of the data Σ. Hint: by using the definition of sample covariance, as a

matrix product or as a sum of outer products. See book for details. Use Numpy for linear algebra computations

(https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.linalg.html).As a result you should have a function

covar() in your code which does not use the built-in covariance functions.

(c)[5 pts] Compute the eigenvectors and eigenvalues of Σ. The numpy linear algebra module referenced above has a

function that can help.

(d)[10 pts] Plot the percentage of retained variance as a function of the number of principle components used.

Determine the number of principal components (PCs) r that will ensure 90% retained variance? How did you

compute this? Provide a function in your code that determines r based on an arbitrary percentage α of retained

variance.

(e)[10 pts] Compute the reduced dimension data matrix A with two dimensions by projection on the first two PCs.

Plot the points using a scatter plot (two dimensional diagram that places each sample i according to its new

dimensions ai1, ai2). Discuss the observations. Are there clusters of close-by points? Argue for or against whether

these are sufficient dimensions.

(f)[5 pts] Study the PCA implementation in pythons’ sklearn library https://scikit-learn.org/stable/

modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA. Do PCA using the library

on the same data. Do the eigenvalues approximately match to what you computed above?

3 [20 pts.] Kernel methods

Consider the problem of finding the most dissimilar diametric pair (MDDP): this is a pair of data points that are

dissimilar from the mean and also dissimilar from each other. Below is an algorithm that would find such a pair

given a data matrix D:

Algorithm 1: MDDP(D)

Result: a, b - the most dissimilar diametric points in D

Compute the data mean µ = mean(D);

s = +∞;

for i in (1 . . . n) do

for j in (i + 1 . . . n) do

temp = x

i µ + x

j µ + x

i xj ;

if temp < s then

s = temp;

a = xi

;

b = xj ;

end

The algorithm computes the sum of inner products x

i µ + x

j µ + x

i xj for each pair of points and returns the

pair with the lowest such quantity.

(a)[5 pts] Demonstrate the execution of this algorithm on the following data matrix of 2D instances: D=





0 1

1 3

5 0

2 4



.

Show the steps and the resulting MDD pair of points.

(b)[15 pts] As we discussed in class sometimes we would like to kernelize methods to handle non-linearity in data.

Provide a pseudo-code for a kernel version of the MDDP algorithm above. The goal is to kernelize the algorithm

for an arbitrary kernel Hint: Assume that you can compute the kernel matrix K, corresponding to

some mapping φ() and then use the basic kernel operations we discussed in class and also in the

book, to derive the steps of MDDP in terms of elements in K.

4 [10 pts.] Orthogonality of Error in Regression:

Prove that Yˆ T

= 0, where Yˆ is the predicted response and = Y −Yˆ is the error between the actual and predicted

response. Hint: Use the solution for the predicted response as a transformation of Y through the

hat matrix.

5 [25 pts.] Regression to understand CO2 pollution:

For this task, you will use data about CO2 emissions in European cities included as an excel sheet. The data has

total emissions per city (Column E) as well as other variables including number of airports, buildings, etc

(columns G and later). Use only data for cities (Column C: admin level=8). Feel free to prepare the data into

simpler text format before you use code.

The goal will be to train linear regression models of the total CO2 as a function of the predictor variables in

columns G and on.

(a) [7pts] Use scikitlearn’s linear regression: linear model.LinearRegression to learn an unregularized model.

Report the corresponding coefficient for each predictor variable.

(b) [7pts] Use scikitlearn’s ridge regression: linear model.Ridge to learn a regularized model. Try 2 values for

alpha: 1 and 10. Report the coefficients of the predictor variables for the two models.

do the loading coefficients for factors differ between regularized and non-regularized models? How does the

residual error differ? Use visuazlizations as necessary to discuss your findings.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：COMP3670编程代写、代写Java，c++程序

【下一篇】：COMP3670编程代写、代写Java，c++程序

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Python编程Python编程

代写python编程、program程序代写

日期：2021-10-01 10:57

相关文章