联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-08-02 11:29

2019/7/16 NYU Classes : Advanced Test, Analysis, & Exp, Section 004 : Assignments

file:///Users/littlesunsu/Documents/NYU Classes _ Advanced Test, Analysis, & Exp, Section 004 _ Assignments.htm 1/2

Instructions

You are analyzing which factors are driving wine ratings. You are given a data set wine_data_clean_v2.csv

The data contains 129,971 observations with the following variables:

country- country the wine is from

points- points given to rate the wine (0%-100% scale), this is essentially the wine rating

price- price of bottle of wine in USD ($)

province-which province wine is from

variety- type of wine

First make sure you install the following packages to have these libraries:

library(readr)

library(data.table)

library(broom)

options(max.print = "100000") #is useful when regression output is really big

Try reading the file using the base function read.csv()

Notice that some of the names of levels are not readable.

Use the read_csv function from the readr package to read the file:

wine_df2<-read_csv("C:/Users/Yevgeniy/Desktop/NYU Courses/Fall 2018/Advanced Test and Experimental

Design/Assignment 2/wine_data_clean_v2.csv")

(you can also do this by setting file.path() first as we did in class to get your folder location)

Now there is a problem that the variables that are supposed to be factors are characters.

When running your regression model and adding "character variables" write as.factor(variable_name) so it treats

the levels of the variable as factors (indicators/dummies)

1) Write down the mathematical model for testing the impact of several drivers on the outcome (1 point) (Note this

is not the same as R code this should be a math formula)

2) Write down the R model for testing the impact of several drivers on the outcome (1 point)

3) Run the model in R, and assign it to a variable. Store the model estimates in a data frame by using the tidy

command from the "broom" package.

sum_lm<-as.data.table(tidy(model_name))

Notice how now all your regression estimates are stored as a data frame (data table in this case since we coerced

it). This allows us to output the coefficients relevant to us in an easier way. Since we do not really care about the

intercept we can generate a new data frame without the intercept:

no_icept<-sum_lm[!(term=="(Intercept)")]

a) Which variables are statistically significant at the 0.05 significance level and have a coefficient estimate that is

positive (>0). What does it mean that these variables have a positive and significant coefficient? (5 points)

[Hint: Your output should be a data set, you can use the data.table package or the plyr package]

b) Which variable is statistically significant at the 0.05 significance level and has the highest impact on wine rating

points based on the coefficient estimate? Interpret your result. (5 points)

[Hint: You can use data.table package or the dplyr package]

c) Plot the residual vs. fitted graph. Which assumptions can you visually inspect from this graph. Do you think the

linear regression model assumptions of zero conditional mean and homoscedasticity?are satisfied? (2 points)

d) How much variation in wine rating (points) is explained by your independent variables? (1 point)

[Hint: You can use summary(your_reg_model_object)]

2019/7/16 NYU Classes : Advanced Test, Analysis, & Exp, Section 004 : Assignments

file:///Users/littlesunsu/Documents/NYU Classes _ Advanced Test, Analysis, & Exp, Section 004 _ Assignments.htm 2/2

e) After thinking some more about your experiment you realize that the wine taster's unique tastes and preferences

may be biasing the results of your model. You decide to use a fixed-effects model in order to control for

unobservable factors due to the wine tasters themselves. To use the fixed-effects approach you can add a

dummy/indicator variable for each taster, this is done automatically in R by including the taster_name variable as a

factor in your regression:

+as.factor(taster_name)

i) Can you identify which tasters had a negative and significant impact on the average wine rating at the 0.05

significance level? Identify the variable with the highest impact on average wine rating as you did in the previous

model, is it the same or different as the one you had before? (3 points)

ii) Which model has more predictive power in terms of ? Interpret your findings, are wine taster fixed-effects

contributing any predictive power to your model?(2 points)

Additional resources for assignment

File attachment wine_data_clean_v2.csv ( 6 MB; Jul 16, 2019 10:27 am )


R2

2


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp