联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2023-02-17 10:55

Assignment #2 STA437H1S/2005H1S

due Friday February 17, 2023

Instructions: Solutions to problems 1–3 are to be submitted on Quercus (PDF files only).

1. Andrews curves (conceived the University of Toronto’s own David Andrews) represent an

interesting approach to multivariate visualization. The idea is to represent each multivariate

observation (xi1, · · · , xip) (which is possibly normalized) by a sinusoidal function on [0, 1]:

gi(t) =

xi1√

2

+ xi2 sin(2pit) + xi3 cos(2pit) + xi4 sin(4pit) + xi5 cos(4pit) + · · ·

Observations that are similar will have similar Andrews curves while outlying observations

will often have curves that are distinctively different.

On Quercus, there is a file andrews.txt, which contains a function andrews that computes

Andrews curves for a data matrix whose columns are variables and rows are observations;

for example,

> source("andrews.txt") # read the function into R

> x <- cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100),rnorm(100))

> r <- andrews(x,scale=T) # scales columns to have mean 0 and variance 1

The file testdata.txt contains 100? k observations from a 10-variate normal distribution

and k outliers generated from another distribution (where k ≤ 15).

(a) Look at the data using Andrews curves. How many clear outliers do there seem to be?

(b) Using the information from the Andrews curves as well as pairwise scatterplots, principal

components etc, give an estimate of how many outliers are in the data.

2. (a) If {gi(t)} are the Andrews curves defined in question 1, show that

2

∫ 1

0

[gi(t)? gj(t)]2 dt =

p∑

k=1

(xik ? xjk)2.

(b) If xˉ =

1

n

n∑

i=1

xi, what is the Andrews curve of xˉ?

(c) Suppose that xk lies on a line between xi and xj, that is, xk = λxi + (1? λ)xj for some

0 < λ < 1. What can you say about the Andrews curve of xk relative to those of xi and xj?

3. In Assignment #1, you looked at two dimensional scatterplots of data on two species of

rock crabs; here, you will do a principal components analysis of these data.

As before, the data are in a file crabs.txt on Quercus; the columns of the file are species (B

or O), sex (M or F), index (1-50 within each species-sex combination), width of the frontal

lip (LP), the rear width of the shell (RW), length along the midline of the shell (CL), the

maximum width of the shell (CW), and the body depth (BD).

The data can be read into R using the following code:

> x <- scan("crabs.txt",skip=1,what=list("c","c",0,0,0,0,0,0))

> colour1 <- ifelse(x[[1]]=="B","blue","orange") # species colours

> colour2 <- ifelse(x[[2]]=="M","black","red") # sex colours

> sex <- x[[2]]

> FL <- x[[4]]

> RW <- x[[5]]

> CL <- x[[6]]

> CW <- x[[7]]

> BD <- x[[8]]

(a) Using the correlation matrix, do a principal component analysis of the 5 variables.

> r <- princomp(~FL+RW+CL+CW+BD,cor=T)

> summary(r,loadings=T)

Give an interpretation of the first two principal components based on their loadings.

(b) Look at pairwise scatterplots of the 5 principal components using colour1 to distinguish

the two species:

> pairs(r$scores,col=colour1)

Which pairs of principal components seem to separate the two species?

(c) Now look at pairwise scatterplots of the 5 principal components using colour2 to dis-

tinguish the two sexes:

> pairs(r$scores,col=colour2)

Which pairs of principal components seem to separate the two sexes?

(d) Suppose you are given the following measurements for the 5 variables: FL = 18.7,

RW = 15.0, CL = 35.0, CW = 40.3, BD = 16.6. What is your prediction of the species and

sex of this crab?


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp