联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-12-03 09:34

Homework #6

Data Science I

1. In this problem, you will generate simulated data, and then perform PCA and K-means

clustering on the data.

a. Generate a simulated data set with 20 observations in each of three classes (i.e. 60

observations total), and 50 variables.

b. Perform PCA on the 60 observations and plot the first two prin- cipal component

score vectors. Use a different color to indicate the observations in each of the three

classes. If the three classes appear separated in this plot, then continue on to part

(c). If not, then return to part (a) and modify the simulation so that there is greater

separation between the three classes. Do not continue to part (c) until the three

classes show at least some separation in the first two principal component score

vectors. Make sure there is also some overlap between the classes!

c. Perform K-means clustering of the observations with K = 3. How well do the clusters

that you obtained in K-means clustering compare to the true class labels?

d. Perform K-means clustering with K = 2. Describe your results.

e. Now perform K-means clustering with K = 4, and describe your results.

f. Now perform K-means clustering with K = 3 on the first two principal component

score vectors, rather than on the raw data. That is, perform K-means clustering on

the 60 × 2 matrix of which the first column is the first principal component score

vector, and the second column is the second principal component score vector.

Comment on the results.

g. Using the scale() function, perform K-means clustering with K = 3 on the data after

scaling each variable to have standard deviation one. How do these results compare

to those obtained in (c)? Explain.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp