Final HW - Two Beautiful Plots For Business Client

Due Thursday, Dec. 5th @ 11:00pm submitted via CANVAS

Background on Patrick’s Midas Automotive Shops

Patrick owns five Midas automotive repair shops (one of them is shown in Figure 1). To keep track of each

shop, he visits one shop every day. He suspects that each shop is more productive when he is there as opposed

to when he is not. He hires you to consult for him as he wants to make a data-driven decision about his visit

schedule to each of the five shops.

Patrick gives you a file which a previously hired data analyst (note: he fired that data analyst for not making

the insights easy enough to understand.) has cleaned up and put into csv format. It contains 10 weeks worth

of data:

library(tidyverse)

library(causact)

library(greta)

carsDF = readr::read_csv("carsFixed.csv")

carsDF %>%

group_by(shopID) %>%

summarize(numberOfObservations = n(),

numberOfBossVisits = sum(boss))

## # A tibble: 5 x 3

## shopID numberOfObservations numberOfBossVisits

## <dbl> <int> <dbl>

## 1 1 50 10

## 2 2 50 5

## 3 3 50 15

## 4 4 50 5

## 5 5 50 15

where the data description for carsDF is given below:

? observation: Unique identifier for an observation. In total, there are 250 observations representing the

previous 10 weeks of business (5 shops × 10 weeks × 5

days

week = 250 observations).

? shopID: A unique shop identifier for each of the five shops.

? boss: A binary variable equal to one when the boss worked at the given store on that day. Note: The

boss did not visit each shop equally.

? carsFixed: This represents the number of cars that the shop fixed on that given workday.

Here are some additional assumptions you should make about Patrick’s business:

Figure 1: A Midas automotive mechanic shop.

1

? For all shops, each additional car that gets fixed yields $50 of profit on average.

? All shops are open 5 days per week for 50 weeks each year - 250 days per year total.

? If you show Patrick a greek letter (e.g. α, β, etc.) in your graphs, he will fire you on the spot and

ignore you completely.

? If you use the words random variable in your analysis, he will fire you on the spot and ignore you

completely.

? If you show Patrick a continuous density function, he will fire you on the spot and ignore you completely.

? If you use the word Bayesian in your analysis, he will fire you on the spot and ignore you completely.

? Patrick does understand the concept of probability and is a pretty good gambler, so he likes making

decisions when having knowledge of outcome probabilities.

? Patrick really likes simple plots that are insightful.

The Model You Should Use for Interpreting The Data

While you should not show any of the below equivalent models (i.e. the below mathematical, corresponding

graphical, and corresponding computational model) to Patrick, this is the model you should use for your

analysis:

Ki ～ Poisson(λi)

λi = exp(αi + βixi)

αj ～ Normal(α, σα)

βj ～ Normal(β, σβ)

α ～ Normal(3, 2)

σα ～ Uniform(0, 2)

β ～ Normal(0, 1)

σβ ～ Uniform(0, 2)

graph = dag_create() %>%

dag_node("Cars Fixed","K",

data = carsDF$carsFixed,

rhs = poisson(lambda)) %>%

dag_node("Exp Cars Fixed - Shop Level","lambda",

rhs = exp(alpha_shop + beta_shop * x),

child = "K") %>%

dag_node("Intercept - Shop Level","alpha_shop",

rhs = normal(alpha,sigma_alpha),

child = "lambda") %>%

dag_node("Boss Effect - Shop Level","beta_shop",

rhs = normal(beta,sigma_beta),

child = "lambda") %>%

dag_node("Intercept - Midas Level","alpha",

rhs = normal(3,2),

child = "alpha_shop") %>%

dag_node("Std Dev - Midas Level","sigma_alpha",

rhs = uniform(0,2),

child = "alpha_shop") %>%

dag_node("Exp Boss Effect - Midas Level","beta",

rhs = normal(0,1),

child = "beta_shop") %>%

dag_node("Std Dev Boss Effect","sigma_beta",

rhs = uniform(0,2),

child = "beta_shop") %>%

2

dag_node("Boss Present","x",

data = carsDF$boss,

child = "lambda") %>%

dag_plate("Observation","i",

nodeLabels = c("K","lambda","x")) %>%

dag_plate("Shop","j",

nodeLabels = c("beta_shop","alpha_shop"),

data = carsDF$shopID,

addDataNode = TRUE)

graph %>% dag_greta()

## ## The specified DAG corresponds to the following greta code:

## x <- as_data(carsDF$boss) #DATA

## K <- as_data(carsDF$carsFixed) #DATA

## j <- as.factor(carsDF$shopID) #DIM

## j_dim <- length(unique(j)) #DIM

## alpha <- normal(mean = 3, sd = 2) #PRIOR

## sigma_alpha <- uniform(min = 0, max = 2) #PRIOR

## beta <- normal(mean = 0, sd = 1) #PRIOR

## sigma_beta <- uniform(min = 0, max = 2) #PRIOR

## alpha_shop <- normal(mean = alpha, sd = sigma_alpha, dim = j_dim) #PRIOR

## beta_shop <- normal(mean = beta, sd = sigma_beta, dim = j_dim) #PRIOR

## lambda <- exp(alpha_shop[j] + beta_shop[j] * x) #OPERATION

## distribution(K) <- poisson(lambda = lambda) #LIKELIHOOD

## gretaModel <- model(alpha_shop,beta_shop,alpha,sigma_alpha,beta,sigma_beta) #MODEL

## draws <- mcmc(gretaModel) #POSTERIOR

## draws <- replaceLabels(draws) #POSTERIOR

## drawsDF <- draws %>% as.matrix() %>% dplyr::as_tibble() #POSTERIOR

## tidyDrawsDF <- drawsDF %>% tidyr::gather() %>%

## addPriorGroups() #POSTERIOR

HW Assignment and Scoring

Create two beautifully formatted ggplot graphs and submit them in one pdf file with one graph per page.

Each graph’s purpose should be obvious from the title, subtitle, annotations, color choices, legends, axes,

labels, etc.

The purpose of the first graph is to convey the historical data that you are relying on to work with Patrick.

This should be an informative graph where a reader could easily imagine the following accompanying narrative:

Graph #1 depicts the last 10 weeks of data for each of the five shops. For some shops, your effect

is seemingly dramatic; they seem to fix more cars when you are there, but for other shops, we

will need to rely on our statistical model to help us understand the effect of your presence at

those shops. Additionally, your past visit schedule did not include visiting shops equally and some

shops appear much busier than others.

The purpose of graph #2 is to convey the probable values (90% credible interval and median value) for the

expected additional number of cars fixed per day when the boss visits. This will test your ability to work

with the exp link function and apply it to the posterior distribution. Note: link function and exponential

functions are not terms Patrick would enjoy seeing.

Note: His trustworthy brother Michael manages shop3 and Patrick thinks his trips there are more fun than

work-related, so he does not want to give them up completely, but is willing to reduce them.

3

Observation i

Shop j [5]

Intercept - Midas Level

alpha ~ normal(3,2)

Std Dev - Midas Level

sigma_alpha ~ uniform(0,2)

ExpBossEffect-MidasLevel

beta ~ normal(0,1)

Std Dev Boss Effect

sigma_beta ~ uniform(0,2)

Cars Fixed [250]

K ~ poisson(lambda)

ExpCarsFixed-ShopLevel

lambda = exp(alpha_shop[j] + beta_shop[j] * x)

Boss Present [250]

x

Shop [250]

j

Intercept - Shop Level

alpha_shop ~ normal(alpha,sigma_alpha)

Boss Effect - Shop Level

beta_shop ~ normal(beta,sigma_beta)

Figure 2: Graphical model for Midas shop observations.

4

Scoring (12 points Total)

For each graph, you will earn 1 point for each of the following criteria:

? graph’s purpose is clear to the audience

? the graph is beautifully formatted

? the graph has helpful labels

? use of color draws attention to a helpful insight from the graph

? default ggplot color scheme is not used

? text annotation is used to help reader extract some meaning from the graph

Inaccuracies in your representation of the data in graph #1 or the modelling for estimates in graph #2 will

result in reduction of 10% - 40% of the score earned above.

Hints

1. The exponential function is built-in to R. If you need a refresher on this function, see

this link: https://www.khanacademy.org/math/algebra/introduction-to-exponential-functions/

exponential-vs-linear-growth/v/exponential-growth-functions . The exponential function is a common

function to use to map linear predictors (i.e. α + βx) that can range from ?∞ to +∞ to a variable

that is between 0 to +∞. Since the rate parameter (λ) of a Poisson distribution is restricted to be

between 0 and +∞, the exponential function is often useful to turn a linear predictor into a rate

parameter (i.e. 0 ≤ exp(α + βx) <= ∞). This is similar to the use of the invLogit function to get

from linear predictor to probability. Here you will use the exp function to get from linear predictor to a

rate parameter for the Poisson distribution. In other words, you will use your posterior distribution for

α and β values to calculate the rate parameter. This type of modelling trick is so common, it gets a

special name, and is known as using a generalized linear model.

2. The rate parameter, λ, of a Poisson distribution represents the expected number of cars being fixed on

the average day. You can use the posterior distribution for αshop and βshop to estimate λ and hence,

estimate the number of cars per day, on average, that will get fixed for shop j (this calculation changes

when the boss is there versus when the boss is not). Remember to use the exponential function when

doing this calculation.

5

版权所有：编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。