Final HW - Two Beautiful Plots For Business Client
Due Thursday, Dec. 5th @ 11:00pm submitted via CANVAS
Background on Patrick’s Midas Automotive Shops
Patrick owns five Midas automotive repair shops (one of them is shown in Figure 1). To keep track of each
shop, he visits one shop every day. He suspects that each shop is more productive when he is there as opposed
to when he is not. He hires you to consult for him as he wants to make a data-driven decision about his visit
schedule to each of the five shops.
Patrick gives you a file which a previously hired data analyst (note: he fired that data analyst for not making
the insights easy enough to understand.) has cleaned up and put into csv format. It contains 10 weeks worth
of data:
library(tidyverse)
library(causact)
library(greta)
carsDF = readr::read_csv("carsFixed.csv")
carsDF %>%
group_by(shopID) %>%
summarize(numberOfObservations = n(),
numberOfBossVisits = sum(boss))
## # A tibble: 5 x 3
## shopID numberOfObservations numberOfBossVisits
## <dbl> <int> <dbl>
## 1 1 50 10
## 2 2 50 5
## 3 3 50 15
## 4 4 50 5
## 5 5 50 15
where the data description for carsDF is given below:
• observation: Unique identifier for an observation. In total, there are 250 observations representing the
previous 10 weeks of business (5 shops × 10 weeks × 5
days
week = 250 observations).
• shopID: A unique shop identifier for each of the five shops.
• boss: A binary variable equal to one when the boss worked at the given store on that day. Note: The
boss did not visit each shop equally.
• carsFixed: This represents the number of cars that the shop fixed on that given workday.
Here are some additional assumptions you should make about Patrick’s business:
Figure 1: A Midas automotive mechanic shop.
1
• For all shops, each additional car that gets fixed yields $50 of profit on average.
• All shops are open 5 days per week for 50 weeks each year - 250 days per year total.
• If you show Patrick a greek letter (e.g. α, β, etc.) in your graphs, he will fire you on the spot and
ignore you completely.
• If you use the words random variable in your analysis, he will fire you on the spot and ignore you
completely.
• If you show Patrick a continuous density function, he will fire you on the spot and ignore you completely.
• If you use the word Bayesian in your analysis, he will fire you on the spot and ignore you completely.
• Patrick does understand the concept of probability and is a pretty good gambler, so he likes making
decisions when having knowledge of outcome probabilities.
• Patrick really likes simple plots that are insightful.
The Model You Should Use for Interpreting The Data
While you should not show any of the below equivalent models (i.e. the below mathematical, corresponding
graphical, and corresponding computational model) to Patrick, this is the model you should use for your
analysis:
Ki ∼ Poisson(λi)
λi = exp(αi + βixi)
αj ∼ Normal(α, σα)
βj ∼ Normal(β, σβ)
α ∼ Normal(3, 2)
σα ∼ Uniform(0, 2)
β ∼ Normal(0, 1)
σβ ∼ Uniform(0, 2)
graph = dag_create() %>%
dag_node("Cars Fixed","K",
data = carsDF$carsFixed,
rhs = poisson(lambda)) %>%
dag_node("Exp Cars Fixed - Shop Level","lambda",
rhs = exp(alpha_shop + beta_shop * x),
child = "K") %>%
dag_node("Intercept - Shop Level","alpha_shop",
rhs = normal(alpha,sigma_alpha),
child = "lambda") %>%
dag_node("Boss Effect - Shop Level","beta_shop",
rhs = normal(beta,sigma_beta),
child = "lambda") %>%
dag_node("Intercept - Midas Level","alpha",
rhs = normal(3,2),
child = "alpha_shop") %>%
dag_node("Std Dev - Midas Level","sigma_alpha",
rhs = uniform(0,2),
child = "alpha_shop") %>%
dag_node("Exp Boss Effect - Midas Level","beta",
rhs = normal(0,1),
child = "beta_shop") %>%
dag_node("Std Dev Boss Effect","sigma_beta",
rhs = uniform(0,2),
child = "beta_shop") %>%
2
dag_node("Boss Present","x",
data = carsDF$boss,
child = "lambda") %>%
dag_plate("Observation","i",
nodeLabels = c("K","lambda","x")) %>%
dag_plate("Shop","j",
nodeLabels = c("beta_shop","alpha_shop"),
data = carsDF$shopID,
addDataNode = TRUE)
graph %>% dag_greta()
## ## The specified DAG corresponds to the following greta code:
## x <- as_data(carsDF$boss) #DATA
## K <- as_data(carsDF$carsFixed) #DATA
## j <- as.factor(carsDF$shopID) #DIM
## j_dim <- length(unique(j)) #DIM
## alpha <- normal(mean = 3, sd = 2) #PRIOR
## sigma_alpha <- uniform(min = 0, max = 2) #PRIOR
## beta <- normal(mean = 0, sd = 1) #PRIOR
## sigma_beta <- uniform(min = 0, max = 2) #PRIOR
## alpha_shop <- normal(mean = alpha, sd = sigma_alpha, dim = j_dim) #PRIOR
## beta_shop <- normal(mean = beta, sd = sigma_beta, dim = j_dim) #PRIOR
## lambda <- exp(alpha_shop[j] + beta_shop[j] * x) #OPERATION
## distribution(K) <- poisson(lambda = lambda) #LIKELIHOOD
## gretaModel <- model(alpha_shop,beta_shop,alpha,sigma_alpha,beta,sigma_beta) #MODEL
## draws <- mcmc(gretaModel) #POSTERIOR
## draws <- replaceLabels(draws) #POSTERIOR
## drawsDF <- draws %>% as.matrix() %>% dplyr::as_tibble() #POSTERIOR
## tidyDrawsDF <- drawsDF %>% tidyr::gather() %>%
## addPriorGroups() #POSTERIOR
HW Assignment and Scoring
Create two beautifully formatted ggplot graphs and submit them in one pdf file with one graph per page.
Each graph’s purpose should be obvious from the title, subtitle, annotations, color choices, legends, axes,
labels, etc.
The purpose of the first graph is to convey the historical data that you are relying on to work with Patrick.
This should be an informative graph where a reader could easily imagine the following accompanying narrative:
Graph #1 depicts the last 10 weeks of data for each of the five shops. For some shops, your effect
is seemingly dramatic; they seem to fix more cars when you are there, but for other shops, we
will need to rely on our statistical model to help us understand the effect of your presence at
those shops. Additionally, your past visit schedule did not include visiting shops equally and some
shops appear much busier than others.
The purpose of graph #2 is to convey the probable values (90% credible interval and median value) for the
expected additional number of cars fixed per day when the boss visits. This will test your ability to work
with the exp link function and apply it to the posterior distribution. Note: link function and exponential
functions are not terms Patrick would enjoy seeing.
Note: His trustworthy brother Michael manages shop3 and Patrick thinks his trips there are more fun than
work-related, so he does not want to give them up completely, but is willing to reduce them.
3
Observation i
Shop j [5]
Intercept - Midas Level
alpha ~ normal(3,2)
Std Dev - Midas Level
sigma_alpha ~ uniform(0,2)
ExpBossEffect-MidasLevel
beta ~ normal(0,1)
Std Dev Boss Effect
sigma_beta ~ uniform(0,2)
Cars Fixed [250]
K ~ poisson(lambda)
ExpCarsFixed-ShopLevel
lambda = exp(alpha_shop[j] + beta_shop[j] * x)
Boss Present [250]
x
Shop [250]
j
Intercept - Shop Level
alpha_shop ~ normal(alpha,sigma_alpha)
Boss Effect - Shop Level
beta_shop ~ normal(beta,sigma_beta)
Figure 2: Graphical model for Midas shop observations.
4
Scoring (12 points Total)
For each graph, you will earn 1 point for each of the following criteria:
• graph’s purpose is clear to the audience
• the graph is beautifully formatted
• the graph has helpful labels
• use of color draws attention to a helpful insight from the graph
• default ggplot color scheme is not used
• text annotation is used to help reader extract some meaning from the graph
Inaccuracies in your representation of the data in graph #1 or the modelling for estimates in graph #2 will
result in reduction of 10% - 40% of the score earned above.
Hints
1. The exponential function is built-in to R. If you need a refresher on this function, see
this link: https://www.khanacademy.org/math/algebra/introduction-to-exponential-functions/
exponential-vs-linear-growth/v/exponential-growth-functions . The exponential function is a common
function to use to map linear predictors (i.e. α + βx) that can range from −∞ to +∞ to a variable
that is between 0 to +∞. Since the rate parameter (λ) of a Poisson distribution is restricted to be
between 0 and +∞, the exponential function is often useful to turn a linear predictor into a rate
parameter (i.e. 0 ≤ exp(α + βx) <= ∞). This is similar to the use of the invLogit function to get
from linear predictor to probability. Here you will use the exp function to get from linear predictor to a
rate parameter for the Poisson distribution. In other words, you will use your posterior distribution for
α and β values to calculate the rate parameter. This type of modelling trick is so common, it gets a
special name, and is known as using a generalized linear model.
2. The rate parameter, λ, of a Poisson distribution represents the expected number of cars being fixed on
the average day. You can use the posterior distribution for αshop and βshop to estimate λ and hence,
estimate the number of cars per day, on average, that will get fixed for shop j (this calculation changes
when the boss is there versus when the boss is not). Remember to use the exponential function when
doing this calculation.
5
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。