1.[19] The air.csv. data consists of 111 observations, taken from an observational environmental study that measured the four variables ozone(surface PPM), solar radiation, temperature(degrees F), wind speed(MPH) and sky condition (C-Cloudy, PC-Partially Cloudy, S-Sunny) for 111 consecutive days in New York. We are modeling ozone as the dependent variable.
a)Read this data into a data frame called air.
b)Run a function that will provide a summary of all variables in air including sky.
c)Attach air so you can work with its components directly for the rest of this question.
d)Create a graphic page with 4 figures (2 by 2). Have ozone on the y-axis, and have temperature, radiation, wind and sky on the x-axis respectively. For sky, produce boxplots of ozone with a box for each level of sky, and for the other variables, produce scatter plots with loess smothers overlaid.
e)Remove all rows with any missing measurements from air. Use this updated data for the remainder of this assignment. Show any code you used to ensure that the objects you’re working with were updated.
f)Run a multiple linear regression with ozone as the outcome and the following predictors: radiation, temperature and wind. Save the results of this regression in an object named reg.results.
g)Use a single function to provide a summary of these reg.results. In a comment tell me, how much of the variance in ozone was explained by this regression model (i.e. what is the multiple R2)?
h)Since these values were measured on 111 consecutive days, we might expect that each value is correlated with the values from the previous days (auto-correlated). Produce the partial auto correlation function (PACF) for ozone. Is ozone auto-correlated? What is the correlation coefficient between each day and the previous day?
i)Ordinary linear regression requires that the residuals are independent form one another. Run the PACF on the residuals from the previous linear regression fit. Is there any substantial evidence that the residuals are auto-correlated? Why or why not?
j)Make an ordered factor named ozone.cat that classifies ozone into three equal sized groups. Do this by using functions which will group the observations according to increasing ozone into three groups with roughly one third of the observations in each group. Do this in such a way that the expression would remain valid if the data in ozone changed. Label these three groups as Low, Medium and High respectively.
k)Produce a contingency table of sky condition vs. ozone.cat.
l)Perform a Chi-Square test on this table.
2.(For graduate students only)[8] Make a binary infix operator called %ols% that takes the arguments x and y and returns the ordinary least squares regression coefficients for regressing y on x. In this function x and y may any combination of vectors or matrices. Rather than using a built in R function, use the equation we saw in Question 2d) of the second assignment. This function should add a column of 1s to x so that an intercept is fit. For example x<-1:10; y<-2+3*x; y%ols%x returns: the vector 2, 3.
i)Regress ozone on temperature.
ii)Try replicating the model in Q1 f) by setting y to ozone and x to a matrix where the first column contains radiation and the next two columns contain wind and temperature, respectively. Do this part in one expression.
3.[8] Some properties of a square matrix.
Given a numeric square matrix A, write a function called matprop which will return a list with three components named: trace, symmetric, and idem. For full marks you should be able to write this function in one line. The three components should contain the following values.
a)trace -- a vector containing sum of the elements on the diagonal.
b)symmetric – A value of TRUE if matrix A is symmetric, that is if it satisfies the following condition: for all rows i and all columns j, or in other words A is equal to its transpose. If a matrix is not symmetric return FALSE.
c)idem – Return TRUE if the matrix A is idempotent and FALSE otherwise. A matrix is idempotent if.
4.[8] Vectorizing functions.
Provide a vectorized solution that replaces the entire expression in these function by a one line expression that does not use loop. You can assume x is a numeric vector and val is a single numeric value.
a)val.positions<-function(x,val)
{
val.pos<-NULL
for(i in 1:length(x) )
if (x[i]==val) val.pos<-c(val.pos,i)
val.pos
}
b) monotonic<-function(x)
{ monotonic.increasing<-TRUE
monotonic.decreasing<-TRUE
for(i in 2:length(x)) if(x[i]-x[i-1]<0) monotonic.increasing<-FALSE
for(i in 2:length(x)) if(x[i]-x[i-1]>0) monotonic.decreasing<-FALSE
monotonic.increasing | monotonic.decreasing
}
5.[7] Plotting the PDF and CDF of the t-distribution
First create a vector named t that contains the value -3 and 3 in increments of 0.01.
In two figures on one page as shown below, plot the probability density function and cumulative distribution function (Prob(T)<t) for the t-distribution with infinity, 10 and 1 degree of freedom. Do not worry about the axes numbers or the exact placement of the legend as they will change with the window size. Make the y-limit of the first and second figure range from 0 to 0.5 and 0 to 1 respectively. Make sure the main title is in a font twice the default size, the y-axes are labeled as shown, and the line types 1,2 and 3 correspond to the t-distribution with Inf, 10 and 1 degree of freedom respectively.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。