Semester Project: AD699
In many ways, this assignment is intentionally open-ended -- there are several parts that
ask you to decide what to do, and there is not necessarily a single right or wrong answer to
the question. As a group, you will decide things like which variables to use for your models,
and which to ignore. The outcome matters, but you should focus mainly on the process.
Between now and the due date, I will post helpful video tutorials in the AD699 Video
Library and/or pointers in the form of bullet points beneath the assignment.
If any steps are unclear, or you’re not sure about how to proceed, please reach out to me or
to Solomon with your questions.
Step I: Data Preparation & Exploration (15 points)
Start by downloading two files from Blackboard:
walmart and walmart_marketbasket.
I. Summary Statistics
A. Choose any five of the summary statistics functions shown in the textbook
(or anywhere else) to learn a little bit about your data set. Show screenshots
of the results. Describe your findings in 1-2 paragraphs.
II. Visualization
A. Using ggplot, create any 5 plots that help to describe your data (this is
intentionally open-ended). Show the plots that you made. Write a
two-paragraph description that explains the choices that you made, and
what the resulting plots show.
III. Data Preparation
A. Are there any missing values in your dataset? If so, how did your team
decide to handle this issue?
Step II: Prediction (25 points)
I. Create a multiple regression model with the outcome variable weekly_sales that
aims to predict weekly sales for any Wal-Mart stores of Type A.
A. Describe your process. How did you wind up including the independent
variables that you kept, and discarding the ones that you didn’t keep? In a
narrative of at least two paragraphs, discuss your process and your
reasoning.
B. Show a screenshot of your regression summary, and explain the regression
equation that it generated.
C. In a few sentences, describe/compare your model’s performance against
training data and against validation data.
II. Create a multiple regression model with the outcome variable weekly_sales that
aims to predict weekly sales for any Wal-Mart stores of Type B.
A. Describe your process. How did you wind up including the independent
variables that you kept, and discarding the ones that you didn’t keep? In a
narrative of at least two paragraphs, discuss your process and your
reasoning.
B. Show a screenshot of your regression summary, and explain the regression
equation that it generated.
C. In a few sentences, describe/compare your model’s performance against
training data and against validation data.
III. Create a multiple regression model with the outcome variable weekly_sales that
aims to predict weekly sales for any Wal-Mart stores of Type C.
A. Describe your process. How did you wind up including the independent
variables that you kept, and discarding the ones that you didn’t keep? In a
narrative of at least two paragraphs, discuss your process and your reasoning.
B. Show a screenshot of your regression summary, and explain the regression
equation that it generated.
C. In a few sentences, describe/compare your model’s performance against
training data and against validation data.
IV. Did you notice any significant differences in terms of the predictors that mattered
more for the different types of stores? If so, speculate about some of the possible
reasons why (one paragraph).
Step III: Classification (30 points)
I. For just the Type A stores, create categories for your weekly sales total data by
breaking each week into one of four equally-sized groups: Great Week, Good Week,
Mediocre Week, and Lousy Week.
A. Select any four features from your dataset to use as predictors. Build and
run a k-nearest neighbors model that takes a hypothetical “type” of week
that you’ve created, and classifies it into one of the four bins that you built in
the previous step.
B. Write a two-paragraph narrative that describes how you did this. In your
narrative, be sure to mention how you arrived at the particular k value that
you used.
II. Naive Bayes/Classification Trees.
A. For just the type B stores, using any four predictors, build another model,
using either a naive bayes or classification tree algorithm, that attempts to
predict which bin your hypothetical week (the one you created for your k-nn
model) would fall into.
B. Show a screenshot of the code you used to build your model, the code you
used to run the algorithm, and code you used to assess the algorithm.
C. Write a two-paragraph narrative that describes how you did this. In your
narrative, be sure to talk about things like factor selection and testing against
your training data.
Step IV: Clustering (10 points)
I. Perform either a k-means analysis or a hierarchical clustering analysis to group
transactions from the walmart_marketbasket dataset. You may wish to use
variables such as day of week, number of transactions, number of returns, and
departments involved.
II. Show your code and results, and write two paragraphs describing what you
included in your model, and what you found about the transactions. You do not
need to bring in any outside sources here to assess your findings.
Please hold any clustering-related questions until after we have gone over clustering in
class.
Step V: Conclusions (20 points)
I. Write a 3-5 paragraph summary that describes your overall process and experience
with this assignment. You already summarized your specific steps in some other
parts of the write-up, so focus on the big picture here. Use this section to focus on
some of the big-picture takeaways, conclusions, etc. Excellent conclusions will
include some original thoughts and analysis, rather than just merely recount the
steps taken in the project.
Submit your final report as a PDF to Blackboard before the deadline listed on the
assignment.
Step VI: Presentation
I. Summarize your findings and the overall process of analyzing this data set. Upload
your slides to Blackboard before the deadline listed in the assignment folder.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。