Instructions
Please complete the following questions. Your answer to each question should have two separate sections. In Section 1 section, write out your answers using complete sentences. Include descriptive statistics in the text, or in tables or figures as appropriate. Tables and figures should be of publication quality (i.e., fully labelled, etc.). Integrate inferential statistics into your description of the results. Your answers might be very short. There should be no R code in Section 1.
Section 2 should include the complete R code that you used. Add comments to explain what the code does. The code should show all of the commands that you used, enough for me to replicate exactly what you did. Check that the code runs in one smooth go when you knit the R Markdown together in a fresh R session. You can include figures here that you used to explore the data that you don' t wish to include in the first section. I will use the second section to help identify the sources of any mistakes. The first section should stand alone without the second section.
Use both null hypothesis significance testing and the estimation approach.
While there is a word limit of 3,500 words, your answers should be much shorter than this, perhaps 100-200 words per question. You get credit for a clear and concise report and writing more words than necessary is not required.
Use the template "99999999.Rmd". Replace 9999999 with your own student number. Knit the file into "9999999.html". Zip the files " 9999999.Rmd" and "9999999.html" into "9999999.zip" and submit this zip file. Do not include data files or project files.
If you have questions, please be sure to post them to the module forum (see the "Forums" tab on the module page on my.wbs). If you are stuck coding, post a?minimal working example
Question 1
(50% of the marks)
You work for a big movie production company. Use the data " movie_releases.csv" to explore whether and how the opening weekend gross take for a movie is affected by the number of words in the movie title. To do this, make two models. The first should model the weekend gross take as a function of the number of words. The second should also include the number of cinemas used for each release.
a.Why does including the number of cinemas used for the release change the number of words from a non-significant predictor into a significant predictor? (Hint: Think about how much of the variance is explained as we did for repeated measures analysis)
b.Your boss is obsessed with one-word titles. How much higher do you expect returns to be for a one-word title compared to longer titles?
The data are from movie releases from July 2017-June 2018. Each row is about one movie. film is the movie title. You might find the code?str_count(str_trim(film), "\\S+") useful for counting words in the title. number.of.cinemas is the number of cinemas the film was played in on the opening weekend. weekend.gross is the gross take on the opening weekend in £s.
Question 2
(50% of the marks)
You work at a credit card company who would like to predict how people pay their credit card bills. At the end of the month credit card holders can choose whether to pay off their entire balance in full, or to make a smaller partial payment allowing the remaining outstanding balance to revolve into the next month and incur an interest charge. Using the data "credit_cards.csv", explore how the decision to repay the balance in full is affected by (a) the card holder's balance and (b) the minimum payment that the card provider requires.
The data are a (heavily anonymised) sample of real repayment decisions. The column bal is the outstanding balance in pounds, min is the minimum payment required (in pounds, the card holder must repay at least this much of the balance), relmin is the minimum as a fraction of the balance, and fullrepay is a dummy variable where 1 indicates a full repayment and 0 indicates otherwise.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。