HW5
November 11, 2018
0.1 Dataset
WikiArt is an amazing resource containing centuries of artwork. Since such datasets are wonderful
for deep learning, Kaggle has hosted a challenge to characterize the ’fingerprints’ of various artists.
The Kaggle dataset contains metadata and also a set of images that have been resized so that the
shorter dimension is 256 pixels. To make this homework reasonably fast even for those without
GPUs, we have further reduced the images to 64 x 64. CNNs and neural networks in general prefer
to have consistent sizes. To achieve this, we cut the center 256 pixels from the longer dimension
and then shrunk the images by a factor of 4. This isn’t a perfect solution since it did cut off a few
heads as you will see.
The selected images are for portraits and landscapes. No, we’re not talking about the orientation
but rather the content of the images. Thanks to help from Rashmi and Dave, we have a small
enough data set that should give reasonable results in a timely manner even on just a CPU.
The data were originally divided into a training and a test set. We have further divided the
training set into a train and validation set. In this homework you will be using the training set and
validation set to train and assess your deep learning models. At the final step, you will see how
well your final training worked on the test set. In each of these directories, there is a truth.txt file
that has the image name and whether it is a portrait or landscape scene.
0.2 Problem 1
Read in and display the first 5 portraits and the first 5 landscapes. Note, if you are using
the OpenCV tools, then the color may be distorted. The cvtColor() method using
cv2.COLOR_BGR2RGB may be useful. However, it is likely easier to use the generator and
plot_strip example from section.
Construct a baseline CNN classifier using Keras for the training set and assess the validation
set performance at each epoch. The goal is to correctly classify portraits from landscapes. Plot the
resulting performance on the training and validation set as a function of epoch using the criteria
over which you are optimizing. You should run at least 20 epochs for this problem.
From the pattern of training and validation curves, describe what is good/bad and what you
plan to do next to improve the result.
0.3 Problem 2
This step is where we want you to do most of your personal learning. Your goal is to improve
the network using a combination of architecture choices, parameter tuning, and experimenting
with different optimizers/dropout/regularization/etc. Treat each of these as separate optimization/exploration
steps for now. We would like to see 3 separate steps that cover different areas.
1
The format of the 3 steps should be as follows: * State the hypothesis/strategy for how you will
improve/explore a particular aspect. * Describe what types of tests you are running and why (i.e.
what range of parameters are you choosing and why) * Include the code and results * State your
interpretation of the results
We’re not looking for research in deep learning, but we want you to gain some hands-on experience
working with Keras and figuring out what works. A good example may be comparing
strategies to overcome overfitting, or comparing a few different CNN architectures in terms of
performance and speed, or comparing data augmentation types and results.
0.4 Problem 3
Assess your best model on the test data. Plot the corresponding AUC curve from the results (since
we’ve provided the truth). This was not directly covered in section, but will require a prediction
using images in the same format as the training. We suggest referring to the Keras API else use a
Google to search to find how to make predictions.
Display the 5 images [worst] misclassified images for each class. Worst is in brackets since
certain architectures may only make a binary decision rather than a score. In that case, plot 5 of
each.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。