联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2018-11-29 11:12

Semester Project: AD699

In many ways, this assignment is intentionally open-ended -- there are several parts that

ask you to decide what to do, and there is not necessarily a single right or wrong answer to

the question. As a group, you will decide things like which variables to use for your models,

and which to ignore. The outcome matters, but you should focus mainly on the process.

Between now and the due date, I will post helpful video tutorials in the AD699 Video

Library and/or pointers in the form of bullet points beneath the assignment.

If any steps are unclear, or you’re not sure about how to proceed, please reach out to me or

to Solomon with your questions.

Step I: Data Preparation & Exploration (15 points)

Start by downloading two files from Blackboard:

walmart and walmart_marketbasket.

I. Summary Statistics

A. Choose any five of the summary statistics functions shown in the textbook

(or anywhere else) to learn a little bit about your data set. Show screenshots

of the results. Describe your findings in 1-2 paragraphs.

II. Visualization

A. Using ggplot, create any 5 plots that help to describe your data (this is

intentionally open-ended). Show the plots that you made. Write a

two-paragraph description that explains the choices that you made, and

what the resulting plots show.

III. Data Preparation

A. Are there any missing values in your dataset? If so, how did your team

decide to handle this issue?

Step II: Prediction (25 points)

I. Create a multiple regression model with the outcome variable weekly_sales that

aims to predict weekly sales for any Wal-Mart stores of Type A.

A. Describe your process. How did you wind up including the independent

variables that you kept, and discarding the ones that you didn’t keep? In a

narrative of at least two paragraphs, discuss your process and your

reasoning.

B. Show a screenshot of your regression summary, and explain the regression

equation that it generated.

C. In a few sentences, describe/compare your model’s performance against

training data and against validation data.

II. Create a multiple regression model with the outcome variable weekly_sales that

aims to predict weekly sales for any Wal-Mart stores of Type B.

A. Describe your process. How did you wind up including the independent

variables that you kept, and discarding the ones that you didn’t keep? In a

narrative of at least two paragraphs, discuss your process and your

reasoning.

B. Show a screenshot of your regression summary, and explain the regression

equation that it generated.

C. In a few sentences, describe/compare your model’s performance against

training data and against validation data.

III. Create a multiple regression model with the outcome variable weekly_sales that

aims to predict weekly sales for any Wal-Mart stores of Type C.

A. Describe your process. How did you wind up including the independent

variables that you kept, and discarding the ones that you didn’t keep? In a

narrative of at least two paragraphs, discuss your process and your reasoning.

B. Show a screenshot of your regression summary, and explain the regression

equation that it generated.

C. In a few sentences, describe/compare your model’s performance against

training data and against validation data.

IV. Did you notice any significant differences in terms of the predictors that mattered

more for the different types of stores? If so, speculate about some of the possible

reasons why (one paragraph).

Step III: Classification (30 points)

I. For just the Type A stores, create categories for your weekly sales total data by

breaking each week into one of four equally-sized groups: Great Week, Good Week,

Mediocre Week, and Lousy Week.

A. Select any four features from your dataset to use as predictors. Build and

run a k-nearest neighbors model that takes a hypothetical “type” of week

that you’ve created, and classifies it into one of the four bins that you built in

the previous step.

B. Write a two-paragraph narrative that describes how you did this. In your

narrative, be sure to mention how you arrived at the particular k value that

you used.

II. Naive Bayes/Classification Trees.

A. For just the type B stores, using any four predictors, build another model,

using either a naive bayes or classification tree algorithm, that attempts to

predict which bin your hypothetical week (the one you created for your k-nn

model) would fall into.

B. Show a screenshot of the code you used to build your model, the code you

used to run the algorithm, and code you used to assess the algorithm.

C. Write a two-paragraph narrative that describes how you did this. In your

narrative, be sure to talk about things like factor selection and testing against

your training data.

Step IV: Clustering (10 points)

I. Perform either a k-means analysis or a hierarchical clustering analysis to group

transactions from the walmart_marketbasket dataset. You may wish to use

variables such as day of week, number of transactions, number of returns, and

departments involved.

II. Show your code and results, and write two paragraphs describing what you

included in your model, and what you found about the transactions. You do not

need to bring in any outside sources here to assess your findings.

Please hold any clustering-related questions until after we have gone over clustering in

class.

Step V: Conclusions (20 points)

I. Write a 3-5 paragraph summary that describes your overall process and experience

with this assignment. You already summarized your specific steps in some other

parts of the write-up, so focus on the big picture here. Use this section to focus on

some of the big-picture takeaways, conclusions, etc. Excellent conclusions will

include some original thoughts and analysis, rather than just merely recount the

steps taken in the project.

Submit your final report as a PDF to Blackboard before the deadline listed on the

assignment.

Step VI: Presentation

I. Summarize your findings and the overall process of analyzing this data set. Upload

your slides to Blackboard before the deadline listed in the assignment folder.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp