联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-03-17 10:40

Assignment Two: Image classification, time series prediction, sentiment

classification

CS909:2018-2019 (MSc Only)

Submission: 12 pm (midday) Wednesday 20th March 2019

Notes:

(a) Submit a zip file of the completed iPython notebooks to Tabula. Make sure to put comments in

your code explaining what you have doneand what you have observed fromthe results.

(b) This assignment will contribute to 35% of your overall mark.

This assignment consists of 3 parts. You can find each part in the labs of CNNfor Image

Classification, RNN forTime Series Prediction,and Sentiment classification, respectively. Please

download the corresponding ipynb files from themodule web page

https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs909.

1. Image classification (35 marks)

In the first task, you are given a dataset containing 6,000 pictures ofcats and dogs (3,000 cats, 3,000

dogs) and asked to train a classifier built upon Convolutional Neural Networks (ConvNets) to

classify imagesas "dogs"or"cats".

You need to perform thefollowing key steps:

a) Split the dataset byselecting 4,800pictures for training, 600 for validation, and 600 for testing.(2 marks)

b) Train a Convolutional NeuralNetwork(ConvNet) on the training set. The general structure of

the ConvNet will be a stack of alternated Conv2D (with relu activation) and MaxPooling2D

layers.A Conv2D layer creates a convolution kernel that is convolved with the layer input to

produce a tensor of outputs. A MaxPooling2D layer is used to downscale input in both the

vertical and horizontaldimensions. (10Marks)

c) Output the training and validation loss curves and accuracy curves. Also output the

classification results (e.g., classification accuracy, confusion matrix, precision-recall curve

and/or ROC curve) on the test set. (5 marks)

d) Explore different network architectures (e.g., stacking 4 Conv2D +MaxPooling2D layers) and

variousways intuning the model parameters to see if you can improve themodel performance

on thevalidation set.(10 marks)

e) Apply the trained model on the testing set and output the classification results. (3 marks)

f) Plotthe saliency map of original image to see which part is important for making classification

decisions. You can refer to thefollowing blog article on how to generatevisualisation results of

the filters in the ConvNets. (5marks)

https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html

2

2. Time series prediction (35 marks)

This isa time series prediction task. You are given a dataset which reports on the weather and the

level of pollution eachhour for five years, and asked to trainRecurrentNeural Networks (RNNs) to

predictthe hourly pollution level.

You need to perform thefollowing key steps:

a) Loadthe data from the file.Performnecessary pre-processing (e.g.,missingvalue replacement,

uninformative attributeremoval, etc.) and visualise the valuesof variousattributes overthe

five-year period. (5 marks)

b) Frame the task as the supervised learning problem as predicting the pollution at the current

hour given the pollution measurement and weather conditions at the previous hour. Using the

first 4 years' data as the training setand theremaining 1 year's data as the test set. Prepare the

training/test sets accordingly.(10 marks)

c) Train a Recurrent Neural Network (RNN) on the training set. You can split the training set

furtherby using 10% ofthe data as thevalidation set and the remaining for training. (5 marks)

d) Output the prediction results such as Root Mean Squared Errors (RMSE) on the test set.

Remember that after theforecasts have been made, we need to invert the transforms to return

the values backinto the original scale. This is neededso thatwe can calculate error scores. Plot

the predicted values vs. the actual values. (5 marks)

e) Explore different network architectures (e.g., stacked LSTM layers) and various ways in tuning

the model parameters to see if you can improve the model performance on the test set. (5 marks)

f) Explore alternative prediction setupby predicting the pollution for the next hour based on the

weatherconditions and pollution over the last 3 days. (5 marks)

3. Sentiment classification (MSc ONLY) (30 marks)

In this task, you will process text data and train neural networks with limited input text data using

pre-trained embeddings for sentiment classification (classifying a review document as "positive" or

"negative" based solelyon the text content of the review).

You need to perform thefollowing key steps:

a) Download themovie review data as raw text.

First, create a"data"directory, thenhead to http://ai.stanford.edu/~amaas/data/sentiment/

and download the raw movie review dataset (if the URL isn't working anymore, just Google

"IMDB dataset"). Save it into the "data" directory. Uncompress it. Storethe individual reviews

into a list of strings,one string per review, and alsocollectthe review labels (positive/

negative) into a separate labels list.

b) Pre-process the review documents. (2 marks)

Pre-process review documents bytokenisation and split the data into the training and testing

sets. You can restrict the training data to the first 1,000 reviews and only consider the top

5,000 words in the dataset. Youcan also cut reviews after 100 words (that is, each review

contains a maximum of 100 words).

c) Download theGloVe word embeddings and map each word in the dataset into its pre-trained

GloVe word embedding. (3 marks)

First go to https://nlp.stanford.edu/projects/glove/ and download the pre-trained

embeddings from2014 English Wikipedia into the"data" directory. It's a  822MB zip file

named glove.6B.zip, containing 100-dimensional embedding vectors for 400,000 words (or

non-word tokens). Un-zip it. Parse the un-zipped file (it's a txt file) to build an index mapping

words (as strings) to their vector representation (as number

Build an embedding matrix that will be loaded into an Embedding layer later. It must be a

matrix of shape(max_words, embedding_dim), where each entry i contains the

embedding_dim-dimensional vector for the word of index i in ourreferenceword index (built

duringtokenization). Note that the index 0 isnot supposed tostand for any word or token --

it's aplaceholder.

d) Build and train a simple Sequential model. (10 marks)

Define a model which contains an Embedding Layer with maximum number of tokens to be

10,000 and embedding dimensionality as 100. Initialise the Embedding Layer with the pretrained

GloVe word vectors. Setthe maximum length of each review to 100. Flattenthe 3D

embedding output to 2D and add a Dense Layer which is the classifier. Train the model with a

'rmsprop optimiser. Youneed tofreeze the embedding layer by setting its trainable

attribute to False so that its weights will notbe updated during training.

e) Plotthe training and validation loss and accuracies and evaluate the trained model on the test

set. (5marks)

f) Add an LSTM layer into the simple neural network architecture and re-train the model on the

training loss/accuracies, also evaluatethe trained model

on the test setand report the results. (10 marks)


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp