联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2023-05-31 09:13

STA 141C - Big Data & High Performance Statistical Computing Spring 2022

Homework 4

Lecturer: Bo Y.-C. Ning Due June 02, 2023

Due June 02, 2023 by 11:59pm.

A few notes:

1. Submit your homework using the file name ”LastName FirstName hw4”

2. Answer all questions with complete sentences.

3. Your code should be readable; writing a piece of code should be compared to writing a page of a book.

Adopt the one-statement-per-line rule. Consider splitting a lengthy statement into multiple lines

to improve readability. (You will lose one point for each line that does not follow the one-statementper-line rule)

4. To help understand and maintain code, you should always add comments to explain your code. (homework with no comments will receive 0 points). For a very long comment, break it into multiple lines.

5. Submit your final work with one .pdf (or .html) file to Canvas. I encourage you to use LATEX for

writing equations. Handwriting is acceptable, you have to scan it and then combine it with the coding

part into a single .pdf (or .html) file. Handwriting should be clean and readable.

1 Handwriting recognition

In this homework, we work on a model-based method for handwritten digit recognition. Following figure

shows example bitmaps of handwritten digits from U.S. postal envelopes.

Each digit is represented by a 32×32 bitmap in which each element indicates one pixel with a value of white

or black. Each 32 × 32 bitmap is divided into blocks of 4 × 4, and the number of white pixels are counted

in each block. Therefore each handwritten digit is summarized by a vector x = (x1, . . . , x64) of length 64

where each element is a count between 0 and 16.

By a model-based method, we mean to impose a distribution on the count vector and carry out classification

using probabilities. The goal is to predict handwritten digit. We separate the dataset into training data and

test data. The training set contains 3823 handwritten digits and the test set contains 1797 digits.

A common distribution for count vectors is the multinomial distribution. However, it is not a good model for

handwritten digits. Let’s work on a more flexible model for count vectors, the Dirichlet-multinomial model.

4-1

4-2 Homework 4: Bo Y.-C. Ning

For a multivariate count vector x = (x1, . . . , xd) with batch size |x|=

Pd

j=1 xj , the probability mass function

for Dirichlet-multinomial distribution is

f(x|α) = 

|x|

x

Qd

j=1(αj )xj

(|α|)|x|

,

where (a)k =

Qk−1

i=0 (a + i).

Given independent data points x1, . . . , xn, the log-likelihood is given by

L(α) = Xn

i=1

log 

|xi

|

xi



+

Xn

i=1

X

d

j=1

[log(Γ(αj + xij ) − log(Γ(αj ))] −

Xn

i=1

[log Γ(|α|+|xi

|) − log Γ(|α|)] .

How do you calculate the MLE?

In this exercise, we use Newton’s method. First, the score function is given by

∂αj

L(α) = Xn

i=1

[Ψ(xij + αj ) − Ψ(αj )] −

Xn

i=1

[Ψ(|xi

|+|α|) − Ψ(|α|)] ,

where Ψ(x) = d(log Γ(x)) = Γ0

(x)/Γ(x).

Next, the observed information matrix is given by

−d

2L(α) = D − c1d1

0

d

,

where D is a diagonal matrix,

Djj =

Xn

i=1

[Ψ0

(αj ) − Ψ

0

(xij + αj )] = Xn

i=1

xXij−1

k=0

1

(αj + k)

2

and

c =

Xn

i=1

[Ψ0

(|α|) − Ψ

0

(|xi

|+|α|)] = Xn

i=1

|xXi|−1

k=0

1

(|α|+k)

2

.

Then given an initial value for α

(0) = (α

(0)

1

, . . . , α

(0)

n ), the Newton’s method keep updating

α

(t) = α

(t−1) + [−d

2L(α

(t−1))]−1

dL(α

(t−1))

for t = 1, . . . , T. We stop when |L(α

(t)

) − L(α

(t−1))|≤ , and then take ˆα = α

(t)

.

Note that D is a diagonal matrix, we should use the Sherman-Morrison formula to write

[−d

2L(α)]−1 = D−1 +

1

1/c −

P

j

d

−1

j

D−1110D−1

In the folder uploaded on Piazza, you will find

• Data containing the training data and the testing data

• ‘ddirmult.R’, which evaluates the likelihood function (if log = FALSE) or the log-likelihood function

(if log = TRUE) of the Dirichlet-multinomial density

Homework 4: Bo Y.-C. Ning 4-3

• ‘ddirmult.fit.R‘, which estimates the maximum likelihood estimator (MLE) by Newton’s method

• ‘trainingMLE.R‘, which estimates the MLE based on the training data

Question 1. Open ‘trainingMLE.R’ and obtain MLE estimators for each of the 10 handwriting digits

(0, 1, 2, . . . , 9). (You may need to change the path when loading the data)

Question 2. Read in the testing data. Use the estimated MLE for each digit from training data to predict

handwriting digits for the testing data.

• Hint 1: To predict the handwriting digit, you should use the ‘ddirmult.R’ function. The following code

can be helpful

1 # testDigitProb stores posterior probability of each digit being 0 ,1 ,... ,9

2 testDigitProb <- matrix (0 , dim ( testdata ) [1] , 10)

3 for ( dig in 0:9) {

4 testDigitProb [, dig + 1] <-

5 ddirmult ( testdata [ , -65] , alphahat [ dig + 1, ], log = TRUE )

6 }

7 testDigitProb <- testDigitProb +

8 rep ( log ( digitCount / sum ( digitCount ) ) , each = nrow ( testdata ))

9 digitPredicted <- max . col ( testDigitProb ) - 1

• Hint 2: To summarize the result, you can construct a confusion table using the code

1 table ( testdata [, 65] , digitPredicted )

The output should look like this:

Question 3. Comment on using gradient descent to obtain the MLE (instead of Newton’s method)? (You

do not need to implement this.)

Question 4. What is the advantage and disadvantage of using gradient descent instead of Newton’s method?

Question 5. Do you think the current method is satisfactory for predicting handwriting digits? Do you

know any other method(s) that can achieve a higher accuracy?


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp