Predictive Analysis Competition (PAC) Project
This project to be completed individually involves performing steps of predictive analysis including exploring and tidying data, fitting competing models, selecting features, tuning model hyperparameters, interpreting results and presenting findings. You will be given a prediction problem to work on. The results of your predictions will be judged against those of your peers.
Details
In the rapidly evolving landscape of digital marketing, businesses strive to drive traffic to their websites to maximize consumer engagement with website content and promote sales. Display advertising is one of the key methods to drive traffic. Success of a display ad is judged by the percentage of visitors who click on the ad, commonly known as Click-Through Rate (CTR). The goal of this competition is to predict CTR based on a set of variables describing the quality, relevance, type, target audience, and content of the display ad.
Arriving at good predictions begins with gaining a thorough understanding of the data. This could be gleaned from examining the description of predictors, learning of the types of variables, and inspecting summary characteristics of the variables. Visual exploration may yield insights missed from merely examining descriptive characteristics. Often the edge in predictive modeling comes from variable transformations such as mean centering or imputing missing values. Review the predictors to look for candidates for transformation.
Not all variables are predictive and models with too many predictors often overfit the data they are estimated on. With the large number of predictors available for this project, it is critically important to judiciously select features for inclusion in the model.
There are a number of predictive techniques discussed in this course, some strong in one area while others strong in another. Furthermore, default model parameters seldom yield the best fit. Each problem is different, therefore deserves a model that is tuned for it.
Finally, predictive modeling is an iterative exercise. It is more than likely that after estimating the model, you will want to go back to the data preparation stage to try a different variable transformation.
Goal
The competition is hosted on Kaggle where you will be able to get the data, submit your predictions and monitor your performance. Once you construct a set of predictions for CTR in the scoring dataset, you will upload your prediction file to Kaggle. Your submission will be evaluated based on RMSE (root mean squared error) and results posted on Kaggle’s Leaderboard. Lower the RMSE, better the model.
Disclaimer: The data shared has been curated for the purpose of PAC. It is meant to be used only for this course and should not be used for any other purpose.
Submission
This project has the following deliverables.
Submit Predictions
You must submit your first set of predictions through the Kaggle competition page. See class slides for due date. You can verify your submission on Kaggle. You must submit a total of at least ten sets of predictions before the deadline. Note, there is a limit of four submissions per day.
PAC Presentations
The ability to explain and communicate your analytical findings to a general audience is critical to your success in using data to influence decisions at your organization. Equally important is to Keep it Simple and Short. Accordingly, you will construct deliver a succinct presentation supported by just one presentation slide. Specific time allowed for your presentation will depend on class size and will be determined by your instructor, but you should expect it to be 1-3 minutes. Your brief presentation should focus on just two issues:
What you did right with the analysis and where you went wrong.
If you had to do it over, what you would do different.
Click the Submit Assignment Button to Upload your presentation slide.
PAC Report
This is a short report summarizing the data analysis process and what you learnt from the experience. Your report should include insights from exploring the data, efforts to prepare the data, and analysis techniques explored. The report should cover not only the ingredients of the final analysis but also the failed steps or missteps along the way. The length of the report should be 2-4 pages and must be supplemented by neatly commented R code for the best submission.
You can submit the written report in a text editor and R code as R syntax files (i.e., .R files) or you can combine the report and code in a Knit R Markdown or Quarto Markdown file (i.e., html file). In addition, you are encouraged to submit separate R code files for your other unsuccessful submissions.
Click the Submit Assignment Button to Upload your written report, Quarto Markdown file(s) and/or R Code file(s).
Assessment
Your assignment will be graded on three criteria described below:
1. Commitment to the project (25 points): consistent work and completion of interim deliverables
2. Prediction accuracy (75 points): accuracy of predictions at the end of the project.
3. Quality of modeling (50 points): knowledge and use of data exploration, tidying, and prediction techniques.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。