Northeastern University
College of Professional Studies
ALY6020 – Predictive Analytics
Winter 2020 CPS Quarter – First Half
Assignment: Fashion items recognition with Logistic Regression.
Due date: 01/23/20
Goal
Implement a basic image recognition system using logistic regression. The data set provided
contains instances of 10 clothing items, each of which is identified with a numeric label (0
through 9).
Label Description
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
Here’s a sample visualization of the data
Your goal is to train and test an image classification model.
Task
0. Download the training and test sets here. Two files are provided. Each of these files
contains thousands of 28x28 grayscale images.
Each image is encoded as a row of 784 integer values between 0 and 255 indicating the
brightness of each pixel. The label associated with each image is encoded as an integer value
between 0 and 9. The arrangement of the data is as follows:
If you want to see the image encoded in each row, you can use the following functions:
rotate <- function(x) {
return(t(apply(x, 2, rev)))
}
plot_matrix <- function(vec) {
q <- matrix(vec, 28, 28, byrow = TRUE)
nq <- apply(q, 2, as.numeric)
image(rotate(nq), col = gray((0:255)/255))
}
If you want to plot the third image, for example, you can call the plot function as follows:
plot_matrix(mydata[3, 2:785])
1. For each class:
a. Relabel each row with a 1 if it corresponds to the item you are training for, and 0
otherwise (e.g., when training for the “sneaker” class, if the label is 7, then make
it a 1, otherwise make it a 0).
b. Train a logistic regression model that predicts the label as a function of the image
pixels: Label ~ Pixel1+Pixel2+Pixel3+ …+Pixel784
2. Test your models on the test set (use the function predict to do this. See example
usage in the logistic regression example we saw in class). The output of each of your
models will be the probability of an image being a particular clothing item. You will
therefore have 10 probabilities per item. Use the softmax function to transform those
unrelated probabilities into a probability distribution over the ten classes. The predicted
class is the one associated with the maximum probability.
3. Create a confusion matrix with the counts of the correct and incorrect classifications. For
example:
Prediction
Tshirt Trouser ... Ankle boot
Actuals Tshirt 890 50 ... 10
Trouser 0 900 ... 30
... ... ... ... ...
Ankle boot 78 1 ... 895
Calculate the overall classification accuracy by summing the counts along the main diagonal
and dividing by the total number of test cases.
As with the previous homework, write a report containing your text, code and data.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。