2020/11/8 EE516 Take Home Mid-Term Exam
file:///C:/Users/Thinkpad/Desktop/EE516_Midterm_2020.html 1/3
EE516 Take Home Mid-Term Exam
Your Name
Due: November 9
The due date for this is firm - no exceptions. Because this is an exam, no collaboration is allowed, aside from
discussions regarding clarification of questions. You can discuss what is being asked in any question among
yourselves, but you may not collaborate on the answers. Your mid-term should clearly address the questions
being asked, and provide any relevant results, scripts, or plots. As always, I will be available for questions, but
I may be a less forthcoming in terms of specific questions related to solutions. I’m happy to provide guidance
regarding what is being asked, but I will be more guarded in my responses regarding how to solve the
questions. Good luck!
1: Probability and PDFs
Probability theory underlies all inferential statistics. For example, in all hypothesis tests, we either specify a
significance level, or more commonly in the era of computers, a p-value is provided from the analysis. Within
this framework, answer the following questions:
A. Explain the basic probability theory that provides the basis for univariate t-tests. Specifically, what is a “t”
statistic and what does the associated p-value represent?
Explain.
B. Explain what confidence intervals on the sample mean represent in terms of the underlying probability
theory for a data set drawn from a univariate normal distribution. How do the size (i.e., n) and variance
of the sample affect confidence intervals on the sample mean?
Explain.
C. Write a short program in R program to illustrate your answers.
# Insert code here.
2: Linear Regression
Many researchers have attempted to relate estimates of vegetation leaf area index measured on the ground to
satellite measurements of surface reflectance collected at the same locations. In the data file “LAI_NDVI.txt”, I
have provided a data set for you to develop a statistical model to do this. The file consists of 50 rows, where
each row contains the following fields:
Station ID, LAI1, LAI2, LAI3, LAI4, LAI5, NDVI
For each station there are five randomly located LAI measurements that were collected within a 15-m radius of
the station (where LAI= the total one-sided area of leaves per unit ground area), with the station located at the
center of a 30-m remotely sensed NDVI measurement (pixel). Having more than one measurement in each
2020/11/8 EE516 Take Home Mid-Term Exam
file:///C:/Users/Thinkpad/Desktop/EE516_Midterm_2020.html 2/3
pixel is useful because LAI can be highly variable. For this problem, your first task is to average the LAI
measurements at each site to provide a single representative LAI value that corresponds to the NDVI
measurement centered over each station.
Your main task is to estimate an appropriate and valid statistical model to predict LAI from NDVI. Carefully
describe how you go about doing this, including the justification and rationale for your approach and an
assessment of your final model in terms of its quality and the degree to which it meets the required
assumptions. Explain your results. Can you exploit the fact that there are multiple LAI measurements at each
station to improve your estimated model?
# Insert code here.
Explain.
3: Multivariate Normal Distribution
The file plainspcp.txt contains monthly precipitation data for 250 stations in the great plains, where the 1st
column provides the precipitation data for each station in January, the 2nd column provides the precipitation
data for February, and so on. Using these data:
A. Perform an analysis where you assess the univariate normality for precipitation in June, July and August
(i.e., each month individually). That is, assess whether the precipitation data in each month is univariate
normal. As part of this analysis, examine the data from each of these months and identify any potential
outliers.
B. Now do the same for these three months in a multivariate context. Is the precipitation data for June, July
and August multivariate normal? Are there any multivariate outliers? To answer this question, you should
compute the standardized distance of each point relative to the mean vector, and follow the basic
procedure outlined in the tutorial covering this material.
4: Tests of mean vectors.
Using the precipitation data (again), but this time using data from February through November, test the
following hypothesis:
: = (1.19, 0.97, 1.06, 2.07, 2.67, 4.08, 3.87, 3.35, 2.81, 2.94)
Explain your method and your results.
# Insert code here.
Explain.
5: Analysis of Variance
In this question you will use data from a data set called LAI.txt. This file consists of 6 columns where the first
two columns correspond to leaf area index measurements collected in May and July, respectively, at a
grassland site in Kansas. The next two columns correspond to the “greenness vegetation index” (GVI; i.e., a
measure of how “green” the surface is) from remote sensing for pixels corresponding to sites on the ground
H0 μ
2020/11/8 EE516 Take Home Mid-Term Exam
file:///C:/Users/Thinkpad/Desktop/EE516_Midterm_2020.html 3/3
where LAI was measured on the same dates. The final two columns correspond to codes for each ground
location indicating burning treatment (1=burned in early spring, 2=unburned) and hillslope position (1=lowland,
2=slope, 3=upland). Using these data:
A. Write an R program to manually compute the univariate between-sample sum of squares and the withinsample
sum of squares for LAI in May and in July as a function of burning treatment (i.e., 2 distinct
ANOVAs, not using builtin functions). Also, compute the F-statistic in each case, and then use the builtin
probability model in R for the F distribution to compute the corresponding p-values for =0.05. Explain
your results. You can use the builtin aov function in R to check your results, but your answer must be
implemented manually in R.
# Insert code here.
Explain.
B. Repeat part (a), but perform a MANOVA using burning treatment as a grouping variable for LAI in both
May and July. That is, write a program in R to compute the Wilks statistic manually (i.e., perform a
MANOVA without using the built-in manova function in R). To do this use the getH and getE functions
that I provided in lecture (available on Blackboard) to compute the H and E matrices, and then use the
built-in function “det” to compute determinants, as appropriate. Explain and interpret your results. Note,
you do not need to compute the p-value for the Wilks statistic.
# Insert code here.
Explain.
α
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。