联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2022-10-15 02:46

MATH5855: Multivariate Analysis

Assignment 2

Due data: 5 pm on Tuesday October 25, 2022

Instructions:

The assignment 2 contains 3 questions and worth a total of 100 points which will count

towards 15% of the final mark for the subject.

Use tables, graphs and concise text explanations to support your answers. Unclear answers

may not be marked at your own cost. All tables and graphs must be clearly commented and

identified.

You may choose to submit two files, the pdf file of the answers and the R markdown file,

containing the R codes, OR answer all the questions as an R markdown file.

Questions

Question 1. (Test and confidence region for mean) [20 Marks] Municipal wastewater treatment

plants release their discharges into rivers and streams and they are required to test the biochemical

oxygen demand (BOD) and suspended solids (SS) of their discharges on a regular basis. There are

some concerns about the reliability of the results provided. So, to confirm the results, a study was

conducted and n = 11 samples of effluent were divided and sent to two laboratories for testing.

One-half of each sample was sent to the Wisconsin State laboratory of Hygiene , and one-half was

sent to a private commercial laboratory routinely used in the monitoring program. The data are

displayed in Table 1.

Assume the data follows a multivariate normal distribution. We are going to answer the question

if there is enough statistical evidence to indicate the two lab analysis procedures are different in

the sense that they produce systematically different results.

(a) Use R to find the p-value for testing the hypothesis H0 : μ1 = μ2 versus H1 : μ1 = μ2,

where μ1 and μ2 are the mean vectors for measurements from commercial and and state

labs, respectively. You can write the function or use a predefined function in R.

1Commercial lab State lab

Sample BOD SS BOD SS

Table 1: Effluent Data.

(b) Use R to find and draw the T2 confidence region for μ1 ∞ μ2 at confidence level 95%. Does

this confidence region confirms the result obtained in part (a)?

Question 2. (principal component analysis) [45 Marks] The dataset ”consum2007.dat” contains

some information about per capita consumption expenditures of urban households in 31 regions

in China in 2007, Lang and Jin (2021). The variables are the consumption expenditures on food

(Food), clothing (Cloth), residence (Resid), household facilities, articles and services (HousF),

health care and medical services (Health), transport and communication (TranC), education, cul

ture and recreation (Educ) and miscellaneous goods (Miscel).

(a) Use R to calculate the correlation between variables. The correlation between which vari

ables is different from zero??

(b) For principal component analysis, do you suggest using the covariance matrix or the corre

lation matrix? Why?

(c) Use R to perform the principal component analysis using the suggested matrix in part (b).

(d) What percentage of the variability of the data does each principal component explain? Also

compute the cumulative percentages of variance and draw a screeplot for these data.

(e) Give explicitly the linear combinations of the original data to create the first and second

2principal components and give an interpretation of these linear combinations, describing

which variables play the biggest roles in the construction of those two PCs.

(f) Draw the biplot for the first 2 principal components. Describe what you can extract from the

plot.

Question 3. (Canonical Analysis)[35 Marks]

(a) Let X and Y be p-variate and q-variate random vectors, respectively. Assume that

Let X? = ATX + u and Y? = BTY + v, where A and B are non-singular matrices with

properly defined dimensions. Show that the first canonical correlation between X? and Y?

is the same as the first canonical correlation between X and Y and the canonical correlation

vectors are given by a? = A?? 1a and b? = B?? 1b, where a and b are the vectors connected

with the first canonical correlation vectors of X and Y, respectively.

(b) Consider the provided data in Question 2. Let X and Y denote the set of variables {Food,

Cloth, Resid, HousF, Miscel} and {Health, TranC, Educ}, respectively. Calculate the canonical

correlations between X and Y and write the the first canonical variables in the explicit form.

Do they have a clear interpretation??

(c) Test the significance of correlation between the two sets; comment on the results. How many

canonical correlations are significant??


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp