 #### 联系方式

• QQ：99515681
• 邮箱：99515681@qq.com
• 工作时间：8:00-23:00
• 微信：codinghelp #### 您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

###### 日期：2020-11-11 11:30

2020/11/8 EE516 Take Home Mid-Term Exam

EE516 Take Home Mid-Term Exam

Due: November 9

The due date for this is firm - no exceptions. Because this is an exam, no collaboration is allowed, aside from

discussions regarding clarification of questions. You can discuss what is being asked in any question among

being asked, and provide any relevant results, scripts, or plots. As always, I will be available for questions, but

I may be a less forthcoming in terms of specific questions related to solutions. I’m happy to provide guidance

regarding what is being asked, but I will be more guarded in my responses regarding how to solve the

questions. Good luck!

1: Probability and PDFs

Probability theory underlies all inferential statistics. For example, in all hypothesis tests, we either specify a

significance level, or more commonly in the era of computers, a p-value is provided from the analysis. Within

this framework, answer the following questions:

A. Explain the basic probability theory that provides the basis for univariate t-tests. Specifically, what is a “t”

statistic and what does the associated p-value represent?

Explain.

B. Explain what confidence intervals on the sample mean represent in terms of the underlying probability

theory for a data set drawn from a univariate normal distribution. How do the size (i.e., n) and variance

of the sample affect confidence intervals on the sample mean?

Explain.

C. Write a short program in R program to illustrate your answers.

# Insert code here.

2: Linear Regression

Many researchers have attempted to relate estimates of vegetation leaf area index measured on the ground to

satellite measurements of surface reflectance collected at the same locations. In the data file “LAI_NDVI.txt”, I

have provided a data set for you to develop a statistical model to do this. The file consists of 50 rows, where

each row contains the following fields:

Station ID, LAI1, LAI2, LAI3, LAI4, LAI5, NDVI

For each station there are five randomly located LAI measurements that were collected within a 15-m radius of

the station (where LAI= the total one-sided area of leaves per unit ground area), with the station located at the

center of a 30-m remotely sensed NDVI measurement (pixel). Having more than one measurement in each

2020/11/8 EE516 Take Home Mid-Term Exam

pixel is useful because LAI can be highly variable. For this problem, your first task is to average the LAI

measurements at each site to provide a single representative LAI value that corresponds to the NDVI

measurement centered over each station.

Your main task is to estimate an appropriate and valid statistical model to predict LAI from NDVI. Carefully

describe how you go about doing this, including the justification and rationale for your approach and an

assessment of your final model in terms of its quality and the degree to which it meets the required

assumptions. Explain your results. Can you exploit the fact that there are multiple LAI measurements at each

station to improve your estimated model?

# Insert code here.

Explain.

3: Multivariate Normal Distribution

The file plainspcp.txt contains monthly precipitation data for 250 stations in the great plains, where the 1st

column provides the precipitation data for each station in January, the 2nd column provides the precipitation

data for February, and so on. Using these data:

A. Perform an analysis where you assess the univariate normality for precipitation in June, July and August

(i.e., each month individually). That is, assess whether the precipitation data in each month is univariate

normal. As part of this analysis, examine the data from each of these months and identify any potential

outliers.

B. Now do the same for these three months in a multivariate context. Is the precipitation data for June, July

and August multivariate normal? Are there any multivariate outliers? To answer this question, you should

compute the standardized distance of each point relative to the mean vector, and follow the basic

procedure outlined in the tutorial covering this material.

4: Tests of mean vectors.

Using the precipitation data (again), but this time using data from February through November, test the

following hypothesis:

: = (1.19, 0.97, 1.06, 2.07, 2.67, 4.08, 3.87, 3.35, 2.81, 2.94)

# Insert code here.

Explain.

5: Analysis of Variance

In this question you will use data from a data set called LAI.txt. This file consists of 6 columns where the first

two columns correspond to leaf area index measurements collected in May and July, respectively, at a

grassland site in Kansas. The next two columns correspond to the “greenness vegetation index” (GVI; i.e., a

measure of how “green” the surface is) from remote sensing for pixels corresponding to sites on the ground

H0 μ

2020/11/8 EE516 Take Home Mid-Term Exam

where LAI was measured on the same dates. The final two columns correspond to codes for each ground

location indicating burning treatment (1=burned in early spring, 2=unburned) and hillslope position (1=lowland,

2=slope, 3=upland). Using these data:

A. Write an R program to manually compute the univariate between-sample sum of squares and the withinsample

sum of squares for LAI in May and in July as a function of burning treatment (i.e., 2 distinct

ANOVAs, not using builtin functions). Also, compute the F-statistic in each case, and then use the builtin

probability model in R for the F distribution to compute the corresponding p-values for =0.05. Explain

implemented manually in R.

# Insert code here.

Explain.

B. Repeat part (a), but perform a MANOVA using burning treatment as a grouping variable for LAI in both

May and July. That is, write a program in R to compute the Wilks statistic manually (i.e., perform a

MANOVA without using the built-in manova function in R). To do this use the getH and getE functions

that I provided in lecture (available on Blackboard) to compute the H and E matrices, and then use the

built-in function “det” to compute determinants, as appropriate. Explain and interpret your results. Note,

you do not need to compute the p-value for the Wilks statistic.

# Insert code here.

Explain.

α