联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Java编程Java编程

日期:2020-10-16 10:54

Project: Predict future sales of Walmart

Table of Contents


1. Introduction1

2. Project assignments2

Assignment 1: Frame the problem3

Assignment 2: Collect raw data3

Assignment 3: Process the data3

Assignment 4: Explore the data3

Assignment 5: Perform in-depth analysis4

Assignment 6: Communicate results4

3. Planning4

4. Coaches4



1. Introduction


In this project you’ll participate in a data science competition at Kaggle.com:

https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/

This competition took place in 2014 and was intended to find suitable candidates for data science jobs at Walmart. The competition is still available even though they might not be recruiting anymore.

You’ll find the following text when you click the link to the competition:

“One challenge of modeling retail data is the need to make decisions based on limited history. If Christmas comes but once a year, so does the chance to see how strategic decisions impacted the bottom line.

In this recruiting competition, job-seekers are provided with historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and participants must project the sales for each department in each store. To add to the challenge, selected holiday markdown events are included in the dataset. These markdowns are known to affect sales, but it is challenging to predict which departments are affected and the extent of the impact.

Want to work in a great environment with some of the world's largest data sets? This is a chance to display your modeling mettle to the Walmart hiring teams.

This competition counts towards rankings & achievements.  If you wish to be considered for an interview at Walmart, check the box "Allow host to contact me" when you make your first entry.

You must compete as an individual in recruiting competitions. You may only use the provided data to make your predictions.

You are provided with historical sales data for 45 Walmart stores located in different regions. Each store contains a number of departments, and you are tasked with predicting the department-wide sales for each store.

In addition, Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of which are the Super Bowl, Labor Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data.

stores.csv

This file contains anonymized information about the 45 stores, indicating the type and size of store.

train.csv

This is the historical training data, which covers to 2010-02-05 to 2012-11-01. Within this file you will find the following fields:

?Store - the store number

?Dept - the department number

?Date - the week

?Weekly_Sales -  sales for the given department in the given store

?IsHoliday - whether the week is a special holiday week

test.csv

This file is identical to train.csv, except we have withheld the weekly sales. You must predict the sales for each triplet of store, department, and date in this file.

features.csv

This file contains additional data related to the store, department, and regional activity for the given dates. It contains the following fields:

?Store - the store number

?Date - the week

?Temperature - average temperature in the region

?Fuel_Price - cost of fuel in the region

?MarkDown1-5 - anonymized data related to promotional markdowns that Walmart is running. MarkDown data is only available after Nov 2011, and is not available for all stores all the time. Any missing value is marked with an NA.

?CPI - the consumer price index

?Unemployment - the unemployment rate

?IsHoliday - whether the week is a special holiday week

For convenience, the four holidays fall within the following weeks in the dataset (not all holidays are in the data):

Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13

Labor Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13

Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13

Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13

2. Project assignments


You’ll go through all phases of a data science project:


https://ajgoldstein.com/2017/11/12/deconstructing-data-science/


Assignment 1: Frame the problem

Carefully read the competition description on Kaggle.com and execute the following tasks:

1.Define the problem that Walmart wants you to solve.

2.Describe the decisions that Walmart can take after the delivery of your work

Assignment 2: Collect raw data

1.Create one profile for your team on Kaggle.com. Let us know what the username of your team is:

?User name:………………

2.Download the data set and upload the data set to the files folder in your MS Teams channel

Assignment 3: Process the data

1.Import the data set into a Jupyter Notebook as dataframes

2.Examine the data at a high level:

a.Understand every column

b.Identify errors, missing values & corrupt records (make sure to check this even though it’s quite a clean data set)

3.Clean the data (make sure to check this even though it’s quite a clean data set)

a.Throw away, replace, filter corrupt/ error prone / missing values

Assignment 4: Explore the data

1.Merge the dataframes in such a way you can use it for exploratory data analysis

2.Play around with the data:

a.Plot the relations between each of the input variables and the output variable (sales).

b.Use statistics to identify significant variables and create relevant features.

Do your analysis for the total and at least 3 stores and 3 departments

Assignment 5: Perform in-depth analysis

1.Create a predictive model

a.Use the features that you found in assignment 4 and day, department and store

b.Use 3 types of model

2.Evaluate and refine the model

a.You may have to revert back to steps 2 and 3

b.Choose the model you want to use

c.Improve the model using cross validation and/or a validation set.

Assignment 6: Communicate results

1.Use power BI to show the results of your analysis. It should be understandable to business people with no data science background

a.Explain how you got to your results

b.Explain the results and how they should be interpreted considering the fact that your audience are not data scientists.

2.Give advice on the decisions to make by Walmart (as described in assignment 1)

a.Markdown strategy: How do you recommend to apply markdowns? Give your recommendations for 3 departments of 3 stores.

b.Holiday markdown strategy: How do you recommend to apply markdowns during the holidays? Give your recommendations for 3 departments of 3 stores.

c.Which indicators (CPI, Fuel Price, etc.) seem to be relevant for which departments of your three stores? How do you recommend Walmart should deal with changes on these indicators?

3. Planning


Date Time TopicPreparation

30th September13:00 – 14:00Introduction of project

7th / 8th October11:00 – 16:00Half hour of project coaching with group’ coachSubmit draft version of your assignments for feedback

14th / 15th October11:00 – 16:00Half hour of project coaching with group’ coachSubmit draft version of your assignments for feedback

28th / 29th October11:00 – 16:00Half hour of project coaching with group’ coachSubmit draft version of your assignments for feedback

8th November23:59Deadline to submit project report and dashboard

11th November9:30 – 15:30Project presentations


4. Coaches


GroupCoachCoaching half hour

1Thomas BeckerWednesday 11:00 – 11:30

2Thomas BeckerWednesday 11:30 – 12:00

3Thomas BeckerWednesday 12:00 – 12:30

4Erik van den HamWednesday 13:00 – 13:30

5Erik van den HamWednesday 13:30 – 14:00

6Erik van den HamWednesday 14:00 – 14:30

7Erik van den HamWednesday 14:30 – 15:00

8Raymond HoogendoornTo be discussed with coach

9Raymond HoogendoornTo be discussed with coach

10Raymond HoogendoornTo be discussed with coach

12Raymond HoogendoornTo be discussed with coach


版权所有:编程辅导网 2018 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。