联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2022-03-04 01:38

School of Mathematics

Bayesian Data Analysis, 2021/2022, Semester 2

Lecturer: Daniel Paulin

Assignment 1

IMPORTANT INFORMATION ABOUT THE ASSIGNMENT

In this paragraph, we summarize the essential information about this assignment. The format

and rules for this assignment are different from your other courses, so please pay attention.

1) Deadline: The deadline for submitting your solutions to this assignment is the 7 March

12:00 noon Edinburgh time.

2) Format: You will need to submit your work as 2 components: a PDF report, and your

R Markdown (.Rmd) notebook. There will be two separate submission systems on Learn:

Gradescope for the report in PDF format, and a Learn assignment for the code in Rmd

format. You are encouraged to write your solutions into this R Markdown notebook (code

in R chunks and explanations in Markdown chunks), and then select Knit/Knit to PDF in

RStudio to create a PDF report.

It suffices to upload this PDF in Gradescope submission system, and your Rmd file in the Learn

assignment submission system. You will be required to tag every sub question on Gradescope.

A video describing the submission process will be posted on Learn.

Some key points that are different from other courses:

a) Your report needs to contain written explanation for each question that you solve, and some

numbers or plots showing your results. Solutions without written explanation that clearly

demonstrates that you understand what you are doing will be marked as 0 irrespectively

whether the numerics are correct or not.

b) Your code has to be possible to run for all questions by the Run All in RStudio, and

reproduce all of the numerics and plots in your report (up to some small randomness due

to stochasticity of Monte Carlo simulations). The parts of the report that contain material

that is not reproduced by the code will not be marked (i.e. the score will be 0), and the only

feedback in this case will be that the results are not reproducible from the code.

1

c) Multiple Submissions are allowed BEFORE THE DEADLINE are allowed for both the

report, and the code. However, multiple submissions are NOT ALLOWED AFTER THE

DEADLINE. YOU WILL NOT BE ABLE TO MAKE ANY CHANGES TO YOUR SUBMISSION

AFTER THE DEADLINE. Nevertheless, if you did not submit anything before the

deadline, then you can still submit your work after the deadline. Late penalties will apply

unless you have a valid extension. The timing of the late penalties will be determined by the

time you have submitted BOTH the report, and the code (i.e. whichever was submitted later

counts).

We illustrate these rules by some examples:

Alice has spent a lot of time and effort on her assignment for BDA. Unfortunately she has

accidentally introduced a typo in her code in the first question, and it did not run using Run

All in RStudio. - Alice will get 0 for the whole assignment, with the only feedback “Results

are not reproducible from the code”.

Bob has spent a lot of time and effort on his assignment for BDA. Unfortunately he forgot to

submit his code. - Bob will get no personal reminder to submit his code. Bob will get 0 for

the whole assignment, with the only feedback “Results are not reproducible from the code, as

the code was not submitted.”

Charles has spent a lot of time and effort on his assignment for BDA. He has submitted both

his code and report in the correct formats. However, he did not include any explanations in

the report. Charles will get 0 for the whole assignment, with the only feedback “Explanation

is missing.”

Denise has spent a lot of time and effort on her assignment for BDA. She has submitted

her report in the correct format, but thought that she can include her code as a link in the

report, and upload it online (such as Github, or Dropbox). - Denise will get 0 for the whole

assignment, with the only feedback “Code was not uploaded on Learn.”

3) Group work: This is an INDIVIDUAL ASSIGNMENT, like a 2 week exam for the course.

Communication between students about the assignment questions is not permitted. Students

who submit work that has not been done individually will be reported for Academic Misconduct,

that can lead to serious consequences. Each problem will be marked by a single

instructor, so we will be able to spot students who copy.

4) Piazza: You are NOT ALLOWED to post questions about Assignment Problems visible to

Everyone on Piazza. You need to specify the visibility of such questions as Instructors only,

2

by selecting Post to / Individual students/Instructors and type in Instructors and click on the

blue Instructors banner that appears below

Students who post any information related to the solution of assignment problems visible to

their classmates will

a) have their access to Piazza revoked for the rest of the course without prior warning, and

b) reported for Academic Misconduct.

Only questions regarding clarification of the statement of the problems will be answered by

the instructors. The instructors will not give you any information related to the solution of

the problems, such questions will be simply answered as “This is not about the statement of

the problem so we cannot answer your question.”

THE INSTRUCTORS ARE NOT GOING TO DEBUG YOUR CODE, AND YOU ARE

ASSESSED ON YOUR ABILITY TO RESOLVE ANY CODING OR TECHNICAL DIFFICULTIES

THAT YOU ENCOUNTER ON YOUR OWN.

5) Office hours: There will be two office hours per week (Monday 16:00-17:00, and Wednesdays

16:00-17:00) during the 2 weeks for this assignment. The links are available on Learn /

Course Information. We will be happy to discuss the course/workshop materials. However,

we will only answer questions about the assignment that require clarifying the statement of

the problems, and will not give you any information about the solutions. Students who ask for

feedback on their assignment solutions during office hours will be removed from the meeting.

6) Late submissions and extensions: Students who have existing Learning Adjustments in

Euclid will be allowed to have the same adjustments applied to this course as well, but they

need to apply for this BEFORE THE DEADLINE on the website

https://www.ed.ac.uk/student-administration/extensions-special-circumstances

by clicking on “Access your learning adjustment”. This will be approved automatically.

For students without Learning Adjustments, if there is a justifiable reason (external circumstances)

for not being able to submit your assignment in time, then you can apply for an

extension BEFORE THE DEADLINE on the website

https://www.ed.ac.uk/student-administration/extensions-special-circumstances

by clicking on “Apply for an extension”. Such extensions are processed entirely by the central

ESC team. The course instructors have no role in this decision so you should not write to us

about such applications. You can contact our Student Learning Advisor, Maria Tovar Gallardo

(maria.tovar@ed.ac.uk) in case you need some advice regarding this.

Students who submit their work late will have late submission penalties applied by the ESC

team automatically (this means that even if you are 1 second late because of your internet

connection was slow, the penalties will still apply). The penalties are 5% of the total mark

3

deduced for every day of delay started (i.e. one minute of delay counts for 1 day). The course

intructors do not have any role in setting these penalties, we will not be able to change them.

The first picture is a rotifier (by Steve Gschmeissner), the second is a unicellular algae (by NEON ja, colored

by Richard Bartz).

Problem 1 - Rotifier and algae data

In this problem, we study an experimental dataset (Blasius et al. 2020, https://doi.org/10.

1038/s41586-019-1857-0) about predator-prey relationship between two microscopic organism:

rotifier (predator) and unicellular green algae (prey). These were studied in a controlled environment

(water tank) in a laboratory over 375 days. The dataset contains daily observations

of the concentration of algae and rotifiers. The units of measurement in the algae column is

106 algae cells per ml of water, while in the rotifier column it is the number of rotifiers per ml

of water.

We are going to apply a simple two dimensional state space model on this data using JAGS.

The first step is to load JAGS and the dataset.

# We load JAGS

library(rjags)

## Loading required package: coda

## Linked to JAGS 4.3.0

4

## Loaded modules: basemod,bugs

#You may need to set the working directory first before loading the dataset

#setwd("/Users/dpaulin/Dropbox/BDA_2021_22/Assignments/Assignment1")

rotifier_algae=read.csv("rotifier_algae.csv")

#The first 6 rows of the dataframe

print.data.frame(rotifier_algae[1:6,])

## day algae rotifier

## 1 1 1.50 NA

## 2 2 0.82 6.58

## 3 3 0.77 17.94

## 4 4 0.36 17.99

## 5 5 0.41 21.12

## 6 6 0.41 17.06

As we can see, some values in the dataset are missing (NA).

We are going to model the true log concentrations xt by the state space model

A are model parameters, and t denotes the time point. In particular, t = 0

corresponds to day 0, and t = 1, 2, . . . , 375 correspond to days 1-375.

Here xt is a two dimensional vector. The first component denotes the logarithm of the rotifier

concentration measured in number of rotifiers per ml of water, and the second component

denotes the logarithm of the algae concentration measured in 106 algae per ml (these units

are the same as in the dataset). A =



A11 A12

A21 A22

is a two times two matrix, and b is a two

dimensional vector.

The observation process is described as

yt = xt + vt,

R are additional model parameters.

a)[10 marks] Create a JAGS model that fits the above state space model on the rotifier-algae

dataset for the whole 375 days period.

Use 10000 burn-in steps and obtain 50000 samples from the model parameters A, b, σ2

R, σ2

A, η2

R, η2

A

(4+2+4=10 parameters in total).

Use a Gaussian prior N

 log(6)

log(1.5)

,



4 0

0 4 for the initial state x0, independent Gaussian

N(0, 1) priors for each 4 elements of A, Gaussian prior N

0

0



,



1 0

0 1 for b, and inverse

Gamma (0.1,0.1) prior for the variance parameters σ

2

R, σ2

A, η2

R, η2

A.

Explain how did you handle the fact that some of the observations are missing (NA) in the

dataset.

5

Explanation: (Write your explanation here)

b)[10 marks]

Based on your MCMC samples, compute the Gelman-Rubin convergence diagnostics (Hint:

you need to run multiple chains in parallel for this by setting the n.chains parameter). Discuss

how well has the chain converged to the stationary distribution based on the results.

Print out the summary of the fitted JAGS model. Do autocorrelation plots for the 4 components

of the model parameter A.

Compute and print out the effective sample sizes (ESS) for each of the model parameters

A, b, σ2

R, σ2

A, η2

R, η2

A.

If the ESS is below 1000 for any of these 10 parameters, increase the sample size/number of

chains until the ESS is above 1000 for all 10 parameters.

Explanation: (Write your explanation here)

c)[10 marks]

We are going to perform posterior predictive checks to evaluate the fit of this model on the data

(using the priors stated in question a). First, create replicate observations from the posterior

predictive using JAGS. The number of replicate observations should be at least 1000.

Compute the minimum, maximum, and median for both log-concentrations (i.e. both for

rotifier and algae, 3 · 2 = 6 in total).

Plot the histograms for these quantities together with a line that shows the value of the function

considered on the actual dataset (see the R code for Lecture 2 for an example). Compute the

DIC score for the model (Hint: you can use the dic.samples function for this).

Discuss the results.

Explanation: (Write your explanation here)

d)[10 marks]

Discuss the meaning of the model parameters A, b, σ2

R, σ2

A, η2

R, η2

A. Find a website or paper that

that contains information about rotifiers and unicellular algae (Hint: you can use Google search

for this). Using your understanding of the meaning of model parameters and the biological

information about these organisms, construct more informative prior distributions for the

model parameters. State in your report the source of information and the rationale for your

choices of priors.

Re-implement the JAGS model with these new priors. Perform the same posterior predictive

checks as in part c) to evaluate the fit of this new model on the data.

Compute the DIC score for the model as well (Hint: lower DIC score indicates better fit on

the data).

Discuss whether your new priors have improved the model fit compared to the original prior

from a).

Explanation: (Write your explanation here)

e)[10 marks] Update the model with your informative prior in part d) to compute the posterior

distribution of the log concentrations sizes (xt) on the days 376-395 (20 additional days).

Plot the evolution of the posterior mean of the log concentrations for rotifier and algae during

days 376-395 on a single plot, along with curves that correspond to the [2.5%, 97.5%] credible

interval of the log concentration size (xt) according to the posterior distribution at each year

[Hint: you need** 2 + 2 · 2 = 6 **curves in total, use different colours for the curves for rotifier

and algae].

6

Finally, estimate the posterior probability that the concentration of algae (measured in 10?6

algae/ml, as in the data) becomes smaller than 0.1 at any time during this 20 additional days

(days 376-395).

Explanation: (Write your explanation here)

Problem 2 - Horse racing data

In this problem, we are going to construct a predictive model for horse races. The dataset

(races.csv and runs.csv) contains the information about 1000 horse races in Hong Kong during

the years 1997-1998 (originally from https://www.kaggle.com/gdaley/hkracing). Races.csv

contains information about each race (such as distance, venue, track conditions, etc.), while

runs.csv contains information about each horse participating in each race (such as finish

time in the race). Detailed description of all columns in these files is available in the file

horse_racing_data_info.txt.

Our goal is to model the mean speed of each horse during the races based on covariates

available before the race begins.

We are going to use INLA to fit several different regression models to this dataset. First, we

load ILNA and the datasets and display the first few rows.

library(INLA)

## Loading required package: Matrix

## Loading required package: foreach

## Loading required package: parallel

## Loading required package: sp

## This is INLA_21.11.22 built 2021-11-21 16:10:15 UTC.

## - See www.r-inla.org/contact-us for how to get help.

## - Save 80Mb of storage running ’inla.prune()’

7

#If it loaded correctly, you should see this in the output:

#Loading required package: Matrix

#Loading required package: foreach

#Loading required package: parallel

#Loading required package: sp

#This is INLA_21.11.22 built 2021-11-21 16:13:28 UTC.

# - See www.r-inla.org/contact-us for how to get help.

# - To enable PARDISO sparse library; see inla.pardiso()

#The following code does the full installation. You can try it if INLA has not been installed.

#First installing some of the dependencies

# install.packages("BiocManager")

# BiocManager::install("Rgraphviz")

#if (!requireNamespace("BiocManager", quietly = TRUE))

# install.packages("BiocManager")

#BiocManager::install("graph")

#Installing INLA

# install.packages("INLA",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)

#library(INLA)

runs <- read.csv(file = 'runs.csv')

head(runs)

## race_id horse_no horse_id result won lengths_behind horse_age horse_country

## 1 0 1 3917 10 0 8.00 3 AUS

## 2 0 2 2157 8 0 5.75 3 NZ

## 3 0 3 858 7 0 4.75 3 NZ

## 4 0 4 1853 9 0 6.25 3 SAF

## 5 0 5 2796 6 0 3.75 3 GB

## 6 0 6 3296 3 0 1.25 3 NZ

## horse_type horse_rating horse_gear declared_weight actual_weight draw

## 1 Gelding 60 -- 1020 133 7

## 2 Gelding 60 -- 980 133 12

## 3 Gelding 60 -- 1082 132 8

## 4 Gelding 60 -- 1118 127 13

## 5 Gelding 60 -- 972 131 14

## 6 Gelding 60 -- 1114 127 5

## position_sec1 position_sec2 position_sec3 position_sec4 position_sec5

## 1 6 4 6 10 NA

## 2 12 13 13 8 NA

## 3 3 2 2 7 NA

## 4 8 8 11 9 NA

## 5 13 12 12 6 NA

## 6 11 11 5 3 NA

## position_sec6 behind_sec1 behind_sec2 behind_sec3 behind_sec4 behind_sec5

## 1 NA 2.00 2.00 1.50 8.00 NA

## 2 NA 6.50 9.00 5.00 5.75 NA

## 3 NA 1.00 1.00 0.75 4.75 NA

## 4 NA 3.50 5.00 3.50 6.25 NA

## 5 NA 7.75 8.75 4.25 3.75 NA

## 6 NA 5.00 7.75 1.25 1.25 NA

## behind_sec6 time1 time2 time3 time4 time5 time6 finish_time win_odds

## 1 NA 13.85 21.59 23.86 24.62 NA NA 83.92 9.7

8

## 2 NA 14.57 21.99 23.30 23.70 NA NA 83.56 16.0

## 3 NA 13.69 21.59 23.90 24.22 NA NA 83.40 3.5

## 4 NA 14.09 21.83 23.70 24.00 NA NA 83.62 39.0

## 5 NA 14.77 21.75 23.22 23.50 NA NA 83.24 50.0

## 6 NA 14.33 22.03 22.90 23.57 NA NA 82.83 7.0

## place_odds trainer_id jockey_id

## 1 3.7 118 2

## 2 4.9 164 57

## 3 1.5 137 18

## 4 11.0 80 59

## 5 14.0 9 154

## 6 1.8 54 34

races<- read.csv(file = 'races.csv')

head(races)

## race_id date venue race_no config surface distance going

## 1 0 1997-06-02 ST 1 A 0 1400 GOOD TO FIRM

## 2 1 1997-06-02 ST 2 A 0 1200 GOOD TO FIRM

## 3 2 1997-06-02 ST 3 A 0 1400 GOOD TO FIRM

## 4 3 1997-06-02 ST 4 A 0 1200 GOOD TO FIRM

## 5 4 1997-06-02 ST 5 A 0 1600 GOOD TO FIRM

## 6 5 1997-06-02 ST 6 A 0 1200 GOOD TO FIRM

## horse_ratings prize race_class sec_time1 sec_time2 sec_time3 sec_time4

## 1 40-15 485000 5 13.53 21.59 23.94 23.58

## 2 40-15 485000 5 24.05 22.64 23.70 NA

## 3 60-40 625000 4 13.77 22.22 24.88 22.82

## 4 120-95 1750000 1 24.33 22.47 22.09 NA

## 5 60-40 625000 4 25.45 23.52 23.31 23.56

## 6 60-40 625000 4 23.47 22.48 23.25 NA

## sec_time5 sec_time6 sec_time7 time1 time2 time3 time4 time5 time6 time7

## 1 NA NA NA 13.53 35.12 59.06 82.64 NA NA NA

## 2 NA NA NA 24.05 46.69 70.39 NA NA NA NA

## 3 NA NA NA 13.77 35.99 60.87 83.69 NA NA NA

## 4 NA NA NA 24.33 46.80 68.89 NA NA NA NA

## 5 NA NA NA 25.45 48.97 72.28 95.84 NA NA NA

## 6 NA NA NA 23.47 45.95 69.20 NA NA NA NA

## place_combination1 place_combination2 place_combination3 place_combination4

## 1 8 11 6 NA

## 2 5 13 4 NA

## 3 11 1 13 NA

## 4 5 3 10 NA

## 5 2 10 1 NA

## 6 9 14 8 NA

## place_dividend1 place_dividend2 place_dividend3 place_dividend4

## 1 36.5 25.5 18.0 NA

## 2 12.5 47.0 33.5 NA

## 3 23.0 23.0 59.5 NA

## 4 14.0 24.5 16.0 NA

## 5 15.5 28.0 17.5 NA

## 6 16.5 408.0 70.0 NA

## win_combination1 win_dividend1 win_combination2 win_dividend2

## 1 8 121.0 NA NA

## 2 5 23.5 NA NA

9

## 3 11 70.0 NA NA

## 4 5 52.0 NA NA

## 5 2 36.5 NA NA

## 6 9 61.0 NA NA

a)[10 marks] Create a dataframe that includes the mean speed of each horse in each race and

the distance of the race in a column [Hint: you can do this adding two extra columns to the

runs dataframe].

Fit a linear regression model (lm) with the mean speed as a response variable. The covariates

should be the horse id as a categorical variable, and the race distance, horse rating, and

horse age as standard variable. Scale the non-categorical covariates before fitting the model

(i.e. center and divide by their standard deviation, you can use the scale function in R for

this).

Print out the summary of the lm model, discuss the quality of the fit.

Explanation: (Write your explanation here)

b)[10 marks] Fit the same model in INLA (i.e. Bayesian linear regression with Gaussian likelihood,

mean speed is the response variable, and the same covariates used with scaling for

the non-categorical covariates). Set a Gamma (0.1,0.1) prior for the precision, and Gaussian

priors with mean zero and variance 1000000 for all of the regression coefficients (including the

intercept).

Print out the summary of the INLA model. Compute the posterior mean of the variance

parameter σ

2

. Plot the posterior density for the variance parameter σ

2

. Compute the negative

sum log CPO (NSLCPO) and DIC values for this model (smaller values indicate better fit).

Compute the standard deviation of the mean residuals (i.e. the differences between the posterior

mean of the fitted values and the true response variable).

Discuss the results.

Explanation: (Write your explanation here)

c)[10 marks] In this question, we are going to improve the model in b) by using more informative

priors and more columns from the dataset.

First, using some publicly available information from the internet (Hint: use Google search)

find out about the typical speed of race horses in Hong Kong, and use this information to

construct a prior for the intercept. Explain the rationale for your choice.

Second, look through all of the information in the datasets that is available before the race

(Hint: you need to read the description horse_racing_data_info.txt for information about

the columns. position, behind, result, won, and time related columns are not available before

the race). Discuss your rationale for including some of these in the dataset (make sure to scale

them if they are non-categorical).

Feel free to try creating additional covariates such as polynomial or interaction terms (Hint:

this can be done using I() in the formula), and you can also try to use a different likelihood

(such as Student-t distribution).

Fit your new model in INLA (i.e. Bayesian linear regression, mean speed is the response

variable, and scaling done for the non-categorical covariates).

Print out the summary of the INLA model. Compute the negative sum log CPO (NSLCPO)

and DIC values for this model (smaller values indicate better fit).

Compute the standard deviation of the mean residuals (i.e. the differences between the posterior

mean of the fitted values and the true response variable).

10

Discuss the results and compare your model to the model from b).

Please only include your best performing model in the report.

Explanation: (Write your explanation here)

d)[10 marks] We are going to perform model checks to evaluate the fit the two models in parts

b) and c) on the data.

Compute the studentized residuals for the Bayesian regression model from parts b) and c).

Perform a simple Q-Q plot on the studentized residuals. Plot the studentized residuals versus

their index, and also plot the studentized residuals against the posterior mean of the fitted

value (see Lecture 2). Discuss the results.

Explanation: (Write your explanation here)

e)[10 marks] In this question, we are going to use the model you have constructed in part

c) to predict a new race, i.e. calculate the posterior probabilities of each participating horse

winning that race. First, we load the dataset containing information about the future race.

race_to_predict <- read.csv(file = 'race_to_predict.csv')

race_to_predict

## race_id date venue race_no config surface distance going horse_ratings

## 1 1000 1998-09-18 ST 2 B+2 0 1400 GOOD 40-15

## prize race_class sec_time1 sec_time2 sec_time3 sec_time4 sec_time5 sec_time6

## 1 485000 5 NA NA NA NA NA NA

## sec_time7 time1 time2 time3 time4 time5 time6 time7 place_combination1

## 1 NA NA NA NA NA NA NA NA 5

## place_combination2 place_combination3 place_combination4 place_dividend1

## 1 7 8 NA 27.5

## place_dividend2 place_dividend3 place_dividend4 win_combination1

## 1 43 57 NA 5

## win_dividend1 win_combination2 win_dividend2

## 1 86 NA NA

runs_to_predict <- read.csv(file = 'runs_to_predict.csv')

runs_to_predict

## race_id horse_no horse_id result won lengths_behind horse_age horse_country

## 1 1000 1 3940 NA NA NA 3 NZ

## 2 1000 2 474 NA NA NA 3 NZ

## 3 1000 3 3647 NA NA NA 3 NZ

## 4 1000 4 144 NA NA NA 3 AUS

## 5 1000 5 3712 NA NA NA 3 AUS

## 6 1000 6 3734 NA NA NA 3 AUS

## 7 1000 7 1988 NA NA NA 3 AUS

## 8 1000 8 3247 NA NA NA 3 AUS

## 9 1000 9 4320 NA NA NA 3 NZ

## 10 1000 10 1077 NA NA NA 3 NZ

## 11 1000 11 3916 NA NA NA 3 AUS

## 12 1000 12 768 NA NA NA 3 NZ

## 13 1000 13 3164 NA NA NA 3 SAF

## 14 1000 14 498 NA NA NA 3 AUS

## horse_type horse_rating horse_gear declared_weight actual_weight draw

## 1 Gelding 60 -- 1148 133 7

11

## 2 Gelding 60 -- 1039 122 4

## 3 Gelding 60 -- 1064 129 5

## 4 Gelding 60 -- 1086 131 2

## 5 Gelding 60 -- 1101 128 6

## 6 Gelding 60 -- 1137 130 8

## 7 Gelding 60 -- 1063 122 11

## 8 Gelding 60 -- 1092 126 10

## 9 Gelding 60 -- 1096 126 13

## 10 Gelding 60 -- 1034 123 9

## 11 Gelding 60 -- 1125 124 1

## 12 Gelding 60 -- 1191 123 3

## 13 Gelding 60 -- 1059 120 14

## 14 Gelding 60 -- 1027 112 12

## position_sec1 position_sec2 position_sec3 position_sec4 position_sec5

## 1 9 6 6 14 NA

## 2 4 4 4 4 NA

## 3 5 3 3 12 NA

## 4 6 8 7 5 NA

## 5 1 2 1 1 NA

## 6 10 9 8 6 NA

## 7 3 1 2 2 NA

## 8 2 5 5 3 NA

## 9 14 13 13 9 NA

## 10 7 7 9 13 NA

## 11 12 11 10 8 NA

## 12 13 14 14 10 NA

## 13 8 10 11 7 NA

## 14 11 12 12 11 NA

## position_sec6 behind_sec1 behind_sec2 behind_sec3 behind_sec4 behind_sec5

## 1 NA 2.75 2.75 3.00 11.00 NA

## 2 NA 1.00 1.75 2.00 2.25 NA

## 3 NA 1.25 1.25 1.50 7.50 NA

## 4 NA 2.00 4.25 3.75 2.75 NA

## 5 NA 0.50 0.15 0.50 1.50 NA

## 6 NA 2.75 4.25 3.75 3.50 NA

## 7 NA 0.75 0.15 0.50 1.50 NA

## 8 NA 0.50 2.50 2.50 2.00 NA

## 9 NA 5.25 6.75 6.75 3.75 NA

## 10 NA 2.25 3.50 4.25 9.50 NA

## 11 NA 3.75 5.50 5.25 3.75 NA

## 12 NA 5.25 7.50 7.25 6.75 NA

## 13 NA 2.75 5.00 5.25 3.50 NA

## 14 NA 3.75 6.25 6.50 7.25 NA

## behind_sec6 time1 time2 time3 time4 time5 time6 finish_time win_odds

## 1 NA NA NA NA NA NA NA NA 55.0

## 2 NA NA NA NA NA NA NA NA 4.6

## 3 NA NA NA NA NA NA NA NA 11.0

## 4 NA NA NA NA NA NA NA NA 3.8

## 5 NA NA NA NA NA NA NA NA 8.6

## 6 NA NA NA NA NA NA NA NA 5.9

## 7 NA NA NA NA NA NA NA NA 12.0

## 8 NA NA NA NA NA NA NA NA 20.0

## 9 NA NA NA NA NA NA NA NA 21.0

## 10 NA NA NA NA NA NA NA NA 57.0

12

## 11 NA NA NA NA NA NA NA NA 26.0

## 12 NA NA NA NA NA NA NA NA 18.0

## 13 NA NA NA NA NA NA NA NA 27.0

## 14 NA NA NA NA NA NA NA NA 62.0

## place_odds trainer_id jockey_id

## 1 17.0 38 138

## 2 1.7 47 31

## 3 2.8 54 34

## 4 1.5 138 57

## 5 2.7 75 131

## 6 2.2 29 18

## 7 4.3 7 63

## 8 5.7 69 151

## 9 5.3 109 145

## 10 17.0 117 38

## 11 6.2 128 125

## 12 5.4 97 49

## 13 8.0 55 91

## 14 17.0 63 155

Based on your model from part c), compute the posterior probabilities of each of these 14 horses

winning the race. [Hint: you will need to sample from the posterior predictive distribution.]

Explanation: (Write your explanation here)

13


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp