联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Matlab编程Matlab编程

日期:2019-03-15 10:43

MAT 167 FINAL PROGRAMMING PROJECT WQ 2019

Read Chapter 6 and Chapter 10 up to, but not including, Section 10.3 in Eld′en.

Read Professor Saito’s twenty-first lecture in NS LECTURE 21.pdf.

Write a MATLAB code to perform the following handwritten digit recognition computations.

Step 01 Download the handwritten digit database

“USPS.mat”

from CANVAS and load this file into your MATLAB session.

(a) This file contains four arrays

train patterns

test patterns

of size 256 × 4649 and

train labels

test labels

of size 10 × 4649.

Rename the array train patterns to training digits

training_digits = train_patterns;

Rename the array train patterns to training digits

test_digits = test_patternsu;

Rename the array train labels to training labels

training_labels = train_labels;

You may find it helpful to think of these arrays as matrices. The arrays training digits

and test digits contain a raster scan of the 16 × 16 gray level pixel intensities that

have been normalized to lie within the range [?1, 1]. The arrays training labels and

test labels contain the true information about the digit images. That is, if the jth

handwritten digit image in training digits truly represents the digit i, then the (i +

1, j)th entry of training labels is +1, and all the other entries of the jth column of

training labels are ?1.

(b) Now, display the first 16 images in training digits using subplot(4,4,k) and imagesc

functions in MATLAB. Print out the figure and include it in your Programming Project

LaTeX and PDF files.

Hint: You need to reshape each column into a matrix of size 16 × 16 followed by

transposing it in order to display it correctly.

Step 02 Read the description of this step in Chapter 10.01 of the textbook and/or Professor Saito’s

Lecture 21. Compute the mean digits in the training digits and put them in a matrix called

training averages of size 256×10, and display these 10 mean digit images using subplot(2,5,k)

and imagesc. Print out the figure as a PDF file and include it in your LaTeX and PDF documents.

2019 Prof. E. G. Puckett Revision 3.05 – 1 – Mon 11th Mar, 2019 at 18:06

MAT 167 FINAL PROGRAMMING PROJECT WQ 2019

Hint: You can gather (or pool) all the images in training digits corresponding to digit k ? 1

(1 ≤ k ≤ 10) using the following MATLAB command:

training_digits(:, training_labels(k,:)==1);

Step 03 Read the description of this step in Chapter 10.01 of the textbook and/or Professor Saito’s

Lecture 21. Now conduct the simplest classification computations as follows.

(a) First, prepare a matrix called test classification of size 4649 × 101 and fill this

array by computing the Euclidean distance (or its square) between each image in the

test digits and each mean digit image in training averages.

Hint: the following line computes the squared Euclidean distances between all of the test

digit images and the kth mean digit of the training dataset with one line of MATLAB

code:

sum((test_digits-repmat(training_averages(:,k),[1 4649])).^2);

(b) ) Compute the classification results by finding the position index of the minimum of each

column of test classification. Put the results in a vector test classification res

of size 1 × 4649.

Hint: You can find the position index giving the minimum of the jth column of test classification

by

>> [tmp, ind] = min(test_classification(:,j));

Then, the variable ind contains the position index, an integer between 1 and 10, of the

smallest entry of test classification(:,j).

(c) ) Finally, compute the confusion matrix test confusion of size 10 × 10, print out this

matrix, and submit your results in the PDF file containing your report.

Hint: First gather the classification results corresponding to the k  1st digit by

>> tmp=test_classification_res(test_labels(k,:)==1);

This tmp array contains the results of your classification of the test digits whose true

digit is k ?1 for 1 ≤ k ≤ 10. In other words, if your classification results were perfect, all

the entries of tmp would be k. But in reality, this simplest classification algorithm makes

mistakes, so tmp contains values other than k. You need to count how many entries have

the value j in tmp, for j = 1 : 10. This will give you the kth row of the test confusion

matrix.

Step 04 Read the description of this step in Chapter 10.02 of the textbook and/or Professor Saito’s

Lecture 21. Now conduct an SVD-based classification computation.

1 Watch out! At one point this was 10 × 4649, which is inconsistent with the dimensions of test classification

in Step 03(b).

2019 Prof. E. G. Puckett Revision 3.05 – 2 – Mon 11th Mar, 2019 at 18:06

MAT 167 FINAL PROGRAMMING PROJECT WQ 2019

(a) Pool all of the images corresponding to the kth digit in the array training digits,

compute the rank 17 SVD of that set of images; i.e., the first 17 singular values and

vectors, and put the left singular vectors (or the matrix U) of the kth digit into the array

left singular vectors of size 256 × 17 × 10. For k = 1 : 10, you can do this with the

following code:

[left_singular_vectors(:,:,k),~,~] =

svds(training_digits(:,training\_labels(k,:)==1),17);

You do not need the singular values and right singular vectors in this computation.

(b) Compute the expansion coefficients of each test digit image with respect to the 17 singular

vectors of each train digit image set. In other words, you need to compute 17×10 numbers

for each test digit image. Put the results in the 3D array test svd17 of size 17×4649×10.

This can be done with the commands

for k=1:10

test_svd17(:,:,k) = left_singular_vectors(:,:,k)’ * test_digits;

end

(c) Next, compute the error between each original test digit image and its rank 17 approximation

using the kth digit images in the training data set. The idea of this classification

is that a test digit image should belong to the class of the kth digit if the corresponding

rank 17 approximation is the best approximation (i.e., the smallest error) among

10 such approximations. Prepare a matrix test digits rank 17 approximation of size

10 × 4649, and put those approximation errors into this matrix.

Hint: The rank 17 approximation of test digits using the 17 left singular vectors of the

kth digit training images can be computed by

left_singular_vectors(:,:,k)*test_digits_rank_17_approximation(:,:,k);}

If this command gives an error, such as

‘‘MATLAB to become unresponsive. See array size limit or preference panel

for more information."

try replacing the command after the Hint above with the following code.

for k = 1:10

for j = 1:4649

tmp = norm(test_digits(:,j) -

left_singular_vectors(:,:,k)*test_svd17(:,j,k));

test_digits_rank_17_approximation(k,j) = tmp;

[tmp,ind] = min(test_digits_rank_17_approximation(:,j));

svd_classification(1,j) = ind;

end

end

? 2019 Prof. E. G. Puckett Revision 3.05 – 3 – Mon 11th Mar, 2019 at 18:06

MAT 167 FINAL PROGRAMMING PROJECT WQ 2019

(d) Finally, compute the confusion matrix using this SVD-based classification method by

following the same strategy as in Step 03(b) and Step 03(c) above. Name this confusion

matrix test svd17 confusion. Include this matrix in your report and submit your

results.

Step 05 ANALYZE YOUR RESULTS IN A WELL WRITTEN REPORT!

(a) For Step 01 explain your understanding of the data structure in which the images of the

digits are stored. In particular, include a brief explanation of the difference between the

training data and the test data. (This is a simple example of machine learning. These are

most likely the first machine learning algorithms to be widely used in the ‘real world’.)

(b) Give an explanation of what you are doing in Step 02, and why you are doing it. You

will find some helpful comments concerning Step 02 in Chapter 10.01 of Eld′en. Include

some thoughts to support your comments.

(c) Comment on the intermediate results at the end of Step 03 and at the end of Step 04.

How effective is each algorithm; i.e, for that particular algorithm what percentage of each

digit is identified correctly? Which digit is the most difficult to identify correctly? Which

digit is the easiest to identify correctly? You can obtain all of this information from the

confusion matrices you produced in Step 03 and Step 04. Include some thoughts to

support your comments. In particular, in YOUR OWN WORDS explain the theory that

is behind the algorithm in (a)–(d). (This is discussed in detail in Chapter 10.2 of Eld′en.)

(d) Summarize all of your results in a separate section at the end. Compare your results

from Step 03, and Step 04. Which of the two algorithms yields the best result? Why?

Step 06 Submit a well documented MATLAB program named

“Digit Recognition youremailname.m”

This program should perform all of the tasks in Step 01 to Step 04 above without any user input.

It is sufficient to have your program print the various images and tables on the computer screen.

In particular, your program does not have to have produce a PDF file containing the images of the

digits produced in Step 01(b) and Step 02.

Again, here is a description of what is meant by a well documented MATLAB program.

Do not submit only the MATLAB source code without comments. Furthermore, do not include

the bare minimum of explanation for each subsection of your code. Please consider using an

active mind when including comments in your program. In particular, as programmers and highly

educated individuals, it is worth your time to describe what you are doing in our own words for each

individual segment of the code; i.e., each portion of the code that performs a separate task, even if

it is ‘only’ inputting a file. For example, ‘What is the format of the file: binary, text, MATLAB

data structures? What is contained in the file? How is it stored? Relate the algorithm(s) back to

the theory we have been studying in lecture and in the homework assignments. When you read

your own code, you should be able to easily identify what you have learned from this writing the

program, and how this relates to the themes presented in lectures and in the textbook.

Step 07 You will be graded on the following items

? 2019 Prof. E. G. Puckett Revision 3.05 – 4 – Mon 11th Mar, 2019 at 18:06

MAT 167 FINAL PROGRAMMING PROJECT WQ 2019

(a) Your MATLAB program should meet the following specifications.

Not require any user input.

Your code should input the file USPS.mat

The TA should be able to run your code on his machine using just your *.m file and

his copy of USPS.mat.

Your code should run without breaking.

The TA should not have to dig in variable explorers to see what you’re talking about.

Display all output on the screen clearly and use variable names that makes sense;

i.e., explicitly tells the TA or any other person who sees you output what they are

looking at.

(b) You’ll be graded on the correctness of the following steps.

1. (b) Your MATLAB program should display the digits in Step 01(b) and this image

of the digits should be included in your report. Typically you create a PDF file and

input it using the LaTeX command

\includegraphics

2. Print out the figure as a PDF file and include it in your report

3.(c) Output your 10 × 10 test confusion matrix when your *.m file is run and also

include it in your report

4(d) Similarly output the confusion matrix from the SVD algorithm when your *.m file

is run and also include it in your report

5. Clearly label 5(a), 5(b), 5(c), 5(d) and explain in detail.

6. Clearly comment each separate part of your code. If the TA has to guess what one or

more lines of your code are doing, he is “ going to be concerned”. He doesn’t need

a long explanation of what’s going on, but a brief explanation on what this line or

lines of code are accomplishing will suffice.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp