Lab Assignment #1 - Estimation
Due: Monday, 9/24 at the beginning of class.
Disclaimer: Labs typically take more time to complete than a regular homework assignment.
In this lab you will:
• Explore bias & variance;
• Numerically calculate MLE’s.
Format:
The lab has two sections which will walk you through each topic and example code (there is a .R file
containing all of code provided in this lab). At the end of each section, you will find a box with a lab
question. This is to be completed for your homework due Monday 9/24. You must include all of your code
(with questions properly labeled and appropriate comments), your answers to any questions (as comments)
and relevant graphs.
Remember: to get the help file for any R function, just type ?functionname in the console.
Section #1: Bias & Variance
Part I
Now that we are all experts at simulating data and writing for loops, we will apply this knowledge to some
dart throwing.
Recall from lecture:
•
ˆθ is a random variable.
It is a function of X1, . . . , Xn and will therefore produce a different estimate from each different sample.
• As a RV, ˆθ has a sampling distribution.
This is the distribution of values of ˆθ produced from all possible samples of the same size from the
population.
• We would like ˆθ to be unbiased.
The sampling distribution of ˆθ is centered at θ.
• We would like ˆθ to have small variance.
The sampling distribution of ˆθ is concentrated as much as possible around θ.
We are going to simulate data to provide estimates of θ that have varying properties (low bias/low variance,
high bias/high variance, low bias/high variance, high bias/low variance).
To begin, we must first create the target. Run the following code:
t = seq(0, 2*pi, length = 1000)
# Circle centered at (0,0) with radius 2
coords = t(rbind(sin(t)*2, cos(t)*2))
plot(coords, type = ’l’, xlab = "", ylab = "", xaxt = ’n’, yaxt = ’n’,
A. Flynt, Department of Mathematics, Bucknell University, MATH 304 1
xlim = c(-3, 3), ylim = c(-3, 3))
# Circle centered at (0,0) with radius 1.75
coords.out = t(rbind(sin(t)*1.75, cos(t)*1.75))
lines(coords.out, type = ’l’)
# Circle centered at (0,0) with radius 1
c.1 = t(rbind(sin(t)*1, cos(t)*1))
lines(c.1, type = ’l’)
# Circle centered at (0,0) with radius .75
c.2 = t(rbind(sin(t)*.75, cos(t)*.75))
lines(c.2, type = ’l’)
# Circle centered at (0,0) with radius .25
coords.in = t(rbind(sin(t)*.25, cos(t)*.25))
lines(coords.in, type = ’l’)
# Circle centered at (0,0) with radius .1
coords.in.in = t(rbind(sin(t)*.1, cos(t)*.1))
# Color in center circle green
polygon(coords.in.in, col = ’green’)
Notice that the outer circle of the target has a radius of 2. When simulating data, we are going to make
draws from two separate Uniform distributions. One that represents our x-coordinate and one that
represents our y-coordinate. If we want to produce estimates centered on the bullseye (low bias) that could
occur at the far edges of the target (high variance), then we will simulate data from two Uniform
distributions with parameters -2 and 2 as follows:
# Initiate vectors for (x, y) coordinates
x.ran = y.ran = NULL
for(i in 1:10){
# Randomly draw a dart from a Uniform(-2, 2).
# Here, left of the bullseye is negative and right of the bullseye is positive.
x.ran[i] = runif(1, -2, 2)
# Here, above the bullseye is positive and below the bullseye is negative.
y.ran[i] = runif(1, -2, 2)
# Draw the dart at the (x,y) coordinates from the previous draw
points(x.ran, y.ran, pch="X", cex=2, col = ’black’)
# Wait 2 seconds between each throw
Sys.sleep(2)
}
A. Flynt, Department of Mathematics, Bucknell University, MATH 304 2
You now have 10 black X’s on your target that represent observations with low bias and high variance.
We can calculate the bias and variance as follows:
# Bias of throws. Just the sum of the average in each (x,y) direction
Bias.black = mean(x.ran) + mean(y.ran)
# Variance of throws. Just the sum of the variance of each (x,y) direction
Variance.black = var(x.ran) + var(y.ran)
Lab Question #1
You will now generate darts for the scenarios of low variance/low bias (color red), low variance/high bias
(color blue), high variance/high bias (color orange). For each set of darts, calculate bias and variance
using similar names. Then add the following legends to your target:
# Legend for bias of each scenario
legend("topleft", title = "Bias", legend=round(c(Bias.black, Bias.red,
Bias.blue, Bias.orange), 2), col = c("black", "red", "blue", "orange"), pch="X")
# Legend for variance of each scenario
legend("topright", title = "Variance", legend = round(c(Variance.black,
Variance.red, Variance.blue, Variance.orange), 2), col = c("black", "red",
"blue", "orange"), pch="X")
Part II
We are now going to run some simulations based on the example that we used in the “Properties of Point
Estimators” packet from lecture.
Let X1, X2, . . . , Xn
iid∼ Uniform(0, θ). Recall, that ˆθMOM = 2X¯ and ˆθMLE∗ =
n+1
n X(n) are two unbiased
estimators for θ.
We are going to run a simulation to see how these two estimators perform.
The idea is to draw random samples of size 25 from a Uniform(0, 12). For each sample, compute 2¯x and
26/25 × max{x1, x2, . . . , xn} and record those values. Repeat this 1000 times.
Lab Question #2
(a) Write and run the code for the simulation described above.
(b) Create two histograms of the 1000 replications, one for each estimate.
Make sure that your histograms have the same x and y axes (this can be done with the arguments
xlim = c(a, b) and ylim = c(d, e), where you have to decide on a, b, d, e).
(c) Which estimator exhibits the smallest amount of variability? Is that what you expected? Why or
why not?
A. Flynt, Department of Mathematics, Bucknell University, MATH 304 3
Section #2: Calculating MLE’s
In this final section, we will calculate maximum likelihood estimators for particular distributions using R.
As always, the first thing to do is write down the log-likelihood function. We have not spent time on creating
functions yet, so I will just show you the syntax for how this works:
# You will first give your function a name.
# Then specify the necessary parameters (pars) and data (object) for your function.
# You will open the function ({)
# Then you will specify the form of your log-likelihood (logl)
# You need to tell the function what it should return when you use it
# Note: the function that we will use to optimize our likelihood only minimizes
# functions, therefore we will return the negative log-likelihood
# Finally, close your function (})
name = function(pars, object){
logl = loglikelihood function
return(-logl)
}
Example
Let X1, X2, . . . , Xn
iid∼ Poisson(λ). Then, the log-likelihood function is given by:
`(λ) = Xni=1xiln(λ) − nλ −Xni=1ln(xi!).
Since the last term does not include the parameter λ, it can be safely ignored for optimization. Thus, the
kernel of the log-likelihood function is:
`(λ) ∝Xni=1xiln(λ) − nλ.
We can program this function using the following syntax:
poisson.lik = function(lambda, x){
logl = sum(x)*log(lambda) - length(x)*lambda
return(-logl)
}
Once the log-likelihood function has been declared, then the nlm command can be invoked. The minimum
specification of this command is nlm(log-likelihood, starting values, data). Here, starting values
is your (vector of) starting value(s) for the optimization of your parameter.
We will now simulate a data set of 1000 observations from a Poisson(λ = 3.25) and numerically find the
maximum likelihood estimate.
# Set the seed so that we all generate the same data set.
set.seed(288)
poiss.data = rpois(1000, 3.25)
A. Flynt, Department of Mathematics, Bucknell University, MATH 304 4
# Note that we are using 1 as the starting value.
out = nlm(poisson.lik, 1, x = poiss.data)
pois.mle = out$estimate
pois.mle
# [1] 3.242
How did we do? Not too bad considering the true value of λ was 3.25. Is this the value that you were
expecting? Why or why not?
Lab Question #3
Recall that on HW #9, I asked you to write down (but not solve) the equations that you would need to
maximize to find the MLE’s for (α, β) from a Gamma distribution. These partial derivatives are not
something that we would want to do by hand (taking the derivative of the gamma function is not too
fun), so instead, we will find the MLE’s numerically.
(a) Write the code for gamma.lik, the log-likelihood function of a Gamma(α, β). Note, that this
function will now have a vector for the pars argument.
(b) Read in “gamma data.txt” from Moodle and optimize this data with starting values of (1, 1) for
(α, β).
Note: if you use read.table(), R will read in the data as a data frame and your likelihood
function will not be able to handle an object of that class. To avoid this issue, you will have to pull
off the x column from the data and pass that into your likelihood function.
(c) What are your MLE’s for (α, β)?
A. Flynt, Department of Mathematics, Bucknell University, MATH 304 5
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp  电子信箱:99515681@qq.com  
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。