Logistic Regression project
For this project, you will need the dataset Film in the Stat2Data package. This dataset is based on 100 movie reviews by the movie critic Leonard Matlin. The variables are described on page 501 of your textbook (problem 9.31). The response variable is Good (1 = a rating of 3 stars or better, 0 = any lower rating).
In the dataset, there is a variable called Origin, which specifies the country of origin. Although the textbook lists 5 values for this variable (0 = USA, 1 = Great Britain, 2 = France, 3 = Italy, 4
=Canada), there are actually 7 (additionally, 5 = Sweden and 6 = Hungary). Before you analyze the data, however, you will need to create a new variable that consolidates the 7 levels of Origin into 2: (2 points)
Create the variable English, which takes the values
o1 for the movies in English (Origin = 0 or 1 or 4)
o0 for all other movies
In the following, NewVariable refers to the new variable you created above.
Model 1: Do a Chi-square test of independence to see if there is any relationship between NewVariable and Good (whether or not the movie was rated as “good” by Leonard Maltin). If there is a relationship, use residuals or odds ratios to explain the association.
(4 points: 1 point for performing the test
1 point for checking the assumptions
1 point for the correct decision based on the p-value
1 point for the correct conclusion based on the decision)
Model 2: Create a logistic regression with Good as the response and NewVariable as the predictor, and test whether the slope for NewVariable is significantly different form 0. Is your conclusion consistent with that of Model 1?
(4 points: 1 point for setting up the Null Hypothesis and Alternative Hypothesis correctly 1 point for getting the correct p-value
1 point for the correct decision and conclusion based on the p-value
1 point for the comparing the conclusions of Models 1 and 2 and point out whether they are consistent.)
Model 3: Find the simplest, best logistic regression model for predicting whether a movie is
good from the following variables: NewVariable, Year, Time, Cast, and Description. Perform
stepwise selection starting with the additive model. Your simplest model should include NewVariable, and your most complex model should include a 5-way interaction (if a 5-way interaction is not possible, then include all the 4-way interactions).
(2 points: 1 for doing model selection where the starting model is different from the ones in the scope
1 for using the “both” direction.)
After you select your final model, write it down and use it to predict the probability of a good rating for the following film:
For Lab Sec 002: A film made in Europe in 1971 that is 90 minutes long, has 5 cast members listed, and a description that is 10 lines of text long.
For Lab Sec 003: A film made in US in 1950 that is 120 minutes long, has 8 cast members listed, and a description that is 16 lines of text long.
For Lab Sec 005: A film made in the Britain in 1982 that is 75 minutes long, has 13 cast members listed, and a description that is 5 lines of text long.
For Lab Sec 006: A film where the actors do not speak English, was made in 1933, is
140 minutes long, has 3 cast members listed, and a description that is 20 lines of text long.
(4 points:2 points for getting the formula correct
2 points for getting the prediction correct)
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。