Project 3
Classification and inference with machine learning
This notebook is arranged in cells. Texts are usually written in the markdown cells, and here you can use html
tags (make it bold, italic, colored, etc). You can double click on this cell to see the formatting.
The ellipsis (...) are provided where you are expected to write your solution but feel free to change the template
(not over much) in case this style is not to your taste.
Hit "Shift-Enter" on a code cell to evaluate it. Double click a Markdown cell to edit.
Link Okpy
In [1]:
Imports
=====================================================================
Assignment: Project 3
OK, version v1.12.5
=====================================================================
Open the following URL:
https://okpy.org/client/login/ (https://okpy.org/client/login/)
After logging in, copy the code from the web page and paste it into th
e box.
Then press the "Enter" key on your keyboard.
Paste your code here: KozofyhR7YkKXUwCK3ycaIJ6Nubck9
Successfully logged in as ljma@berkeley.edu
from client.api.notebook import Notebook
ok = Notebook('Project3_U.ok')
_ = ok.auth(inline = True)
In [2]:
Problem 1 - Using Keras - MNIST
The goal of this notebook is to introduce deep neural networks (DNNs) and convolutional neural networks
(CNNs) using the high-level Keras package and to become familiar with how to choose its architecture, cost
function, and optimizer in Keras. We will also learn how to train neural networks.
We will once again work with the MNIST dataset of hand written digits introduced in HW8. The goal is to find a
statistical model which recognizes and distinguishes between the ten handwritten digits (0-9).
The MNIST dataset comprises handwritten digits, each of which comes in a square image, divided into a
pixel grid. Every pixel can take on nuances of the gray color, interpolating between white and
black, and hence each data point assumes any value in the set . Since there are categories
in the problem, corresponding to the ten digits, this problem represents a generic classification task.
In this Notebook, we show how to use the Keras python package to tackle the MNIST problem with the help of
deep neural networks.
28 × 28 256
{0, 1, … , 255} 10
import numpy as np
from scipy.integrate import quad
#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
Creating DNNs with Keras
Constructing a Deep Neural Network to solve ML problems is a multiple-stage process. Quite generally, one
can identify the key steps as follows:
step 1: Load and process the data
step 2: Define the model and its architecture
step 3: Choose the optimizer and the cost function
step 4: Train the model
step 5: Evaluate the model performance on the unseen test data
step 6: Modify the hyperparameters to optimize performance for the specific data set
We would like to emphasize that, while it is always possible to view steps 1-5 as independent of the particular
task we are trying to solve, it is only when they are put together in step 6 that the real gain of using Deep
Learning is revealed, compared to less sophisticated methods such as the regression models. With this remark
in mind, we shall focus predominantly on steps 1-5 below. We show how one can use grid search methods to
find optimal hyperparameters in step 6.
Step 1: Load and Process the Data
Keras knows to download automatically the MNIST data from the web. All we need to do is import the mnist
module and use the load_data() class, and it will create the training and test data sets or us.
The MNIST set has pre-defined test and training sets, in order to facilitate the comparison of the performance
of different models on the data.
Once we have loaded the data, we need to format it in the correct shape ( ).
The size of each sample, i.e. the number of bare features used is N_features (whis is 784 because we have a
pixel grid), while the number of potential classification categories is "num_classes" (which is 10,
number of digits).
Each pixel contains a greyscale value quantified by an integer between 0 and 255. To standardize the dataset,
we normalize the input data in the interval [0, 1].
(N , ) samples Nfeatures
28 × 28
In [3]:
1. Make a plot of one MNIST digit (2D plot using X data - make sure to reshape it into a matrix) and
label it (which digit does it correspond to?).
28 × 28
Using TensorFlow backend.
from __future__ import print_function
import keras,sklearn
# suppress tensorflow compilation warnings
import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
seed=0
np.random.seed(seed) # fix random seed
tf.set_random_seed(seed)
from keras.datasets import mnist
# input image dimensions
num_classes = 10 # 10 digits
img_rows, img_cols = 28, 28 # number of pixels
# the data, shuffled and split between train and test sets
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
X_train = X_train[:40000]
Y_train = Y_train[:40000]
# reshape data, depending on Keras backend
X_train = X_train.reshape(X_train.shape[0], img_rows*img_cols)
X_test = X_test.reshape(X_test.shape[0], img_rows*img_cols)
# cast floats to single precesion
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# rescale data in interval [0,1]
X_train /= 255
X_test /= 255
In [4]:
Last, we cast the label vectors y to binary class matrices (a.k.a. one-hot format).
plt.imshow(X_train[0].reshape((28,28)), cmap = plt.cm.gray)
plt.title('Label = %d' %Y_train[0])
plt.show()
In [23]:
Here in this template, we use 40000 training samples and 10000 test samples. Remember that we
preprocessed data into the shape (N , ). samples Nfeatures
In [24]:
before conversion -
y vector : [5 0 4 1 9 2 1 3 1 4]
after conversion -
y vector : [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]
X_train shape: (40000, 784)
Y_train shape: (40000, 10)
40000 train samples
10000 test samples
# convert class vectors to binary class matrices
print("before conversion - ")
print("y vector : ", Y_train[0:10])
Y_train = keras.utils.to_categorical(Y_train, num_classes)
Y_test = keras.utils.to_categorical(Y_test, num_classes)
print("after conversion - ")
print("y vector : ", Y_train[0:10])
print('X_train shape:', X_train.shape)
print('Y_train shape:', Y_train.shape)
print()
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
Step 2: Define the Neural Net and its Architecture
We can now move on to construct our deep neural net. We shall use Keras's Sequential() class to
instantiate a model, and will add different deep layers one by one.
Let us create an instance of Keras' Sequential() class, called model . As the name suggests, this class
allows us to build DNNs layer by layer. (https://keras.io/getting-started/sequential-model-guide/
(https://keras.io/getting-started/sequential-model-guide/))
In [4]:
We use the add() method to attach layers to our model. For the purposes of our introductory example, it
suffices to focus on Dense layers for simplicity. (https://keras.io/layers/core/ (https://keras.io/layers/core/))
Every Dense() layer accepts as its first required argument an integer which specifies the number of neurons.
The type of activation function for the layer is defined using the activation optional argument, the input of
which is the name of the activation function in string format. Examples include relu , tanh , elu ,
sigmoid , softmax .
In order for our DNN to work properly, we have to make sure that the numbers of input and output neurons for
each layer match. Therefore, we specify the shape of the input in the first layer of the model explicitly using the
optional argument input_shape=(N_features,) . The sequential construction of the model then allows
Keras to infer the correct input/output dimensions of all hidden layers automatically. Hence, we only need to
specify the size of the softmax output layer to match the number of categories.
First, add a Dense layer with 400 output neurons and relu activation function.
In [26]:
Add another layer with 100 output neurons. Then, we will apply "dropout," a regularization scheme that has
been widely adopted in the neural networks literature: during the training procedure neurons are randomly
“dropped out” of the neural network with some probability giving rise to a thinned network. It prevents
overfitting by reducing spurious correlations between neurons within the network by introducing a
randomization procedure.
p
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
# instantiate model
model = Sequential()
model.add(Dense(400,input_shape=(img_rows*img_cols,), activation='relu'))
In [27]:
Lastly, we need to add a soft-max layer since we have a multi-class output.
In [28]:
Step 3: Choose the Optimizer and the Cost Function
Next, we choose the loss function according to which to train the DNN. For classification problems, this is the
cross entropy, and since the output data was cast in categorical form, we choose the
categorical_crossentropy defined in Keras' losses module. Depending on the problem of interest
one can pick any other suitable loss function. To optimize the weights of the net, we choose SGD. This
algorithm is already available to use under Keras' optimizers module (https://keras.io/optimizers/)
(https://keras.io/optimizers/)), but we could use Adam() or any other built-in one as well. The parameters for
the optimizer, such as lr (learning rate) or momentum are passed using the corresponding optional
arguments of the SGD() function.
While the loss function and the optimizer are essential for the training procedure, to test the performance of the
model one may want to look at a particular metric of performance. For instance, in categorical tasks one
typically looks at their accuracy , which is defined as the percentage of correctly classified data points.
To complete the definition of our model, we use the compile() method, with optional arguments for the
optimizer , loss , and the validation metric as follows:
In [29]:
model.add(Dense(100, activation='relu'))
# apply dropout with rate 0.5
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
# compile the model
model.compile(loss=keras.losses.categorical_crossentropy, optimizer='SGD', metrics=[
Step 4: Train the model
We train our DNN in minibatches. Shuffling the training data during training improves stability of the model.
Thus, we train over a number of training epochs.
(The number of epochs is the number of complete passes through the training dataset, and the batch size is a
number of samples propagated through the network before the model is updated.)
Training the DNN is a one-liner using the fit() method of the Sequential class. The first two required
arguments are the training input and output data. As optional arguments, we specify the mini- batch_size ,
the number of training epochs , and the test or validation data. To monitor the training procedure for every
epoch, we set verbose=True .
Let us set batch_size = 64 and epochs = 10.
In [30]:
Step 5: Evaluate the Model Performance on the Unseen Test Data
Next, we evaluate the model and read of the loss on the test data, and its accuracy using the evaluate()
method.
Train on 40000 samples, validate on 10000 samples
Epoch 1/10
40000/40000 [==============================] - 4s - loss: 1.2012 - acc
: 0.6446 - val_loss: 0.5087 - val_acc: 0.8839
Epoch 2/10
40000/40000 [==============================] - 4s - loss: 0.5895 - acc
: 0.8318 - val_loss: 0.3646 - val_acc: 0.9065
Epoch 3/10
40000/40000 [==============================] - 4s - loss: 0.4755 - acc
: 0.8646 - val_loss: 0.3081 - val_acc: 0.9193
Epoch 4/10
40000/40000 [==============================] - 3s - loss: 0.4100 - acc
: 0.8814 - val_loss: 0.2755 - val_acc: 0.9243
Epoch 5/10
40000/40000 [==============================] - 4s - loss: 0.3716 - acc
: 0.8975 - val_loss: 0.2527 - val_acc: 0.9288
Epoch 6/10
40000/40000 [==============================] - 4s - loss: 0.3445 - acc
: 0.9030 - val_loss: 0.2338 - val_acc: 0.9342
Epoch 7/10
40000/40000 [==============================] - 4s - loss: 0.3185 - acc
: 0.9105 - val_loss: 0.2203 - val_acc: 0.9383
Epoch 8/10
40000/40000 [==============================] - 4s - loss: 0.2991 - acc
: 0.9171 - val_loss: 0.2060 - val_acc: 0.9413
Epoch 9/10
40000/40000 [==============================] - 3s - loss: 0.2815 - acc
: 0.9213 - val_loss: 0.1972 - val_acc: 0.9437
Epoch 10/10
40000/40000 [==============================] - 4s - loss: 0.2656 - acc
: 0.9265 - val_loss: 0.1874 - val_acc: 0.9457
# training parameters
batch_size = 64
epochs = 10
# train DNN and store training info in history
history=model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs,
verbose=1, validation_data=(X_test, Y_test))
In [32]:
9632/10000 [===========================>..] - ETA: 0sTest loss: 0.187
36902387440205
Test accuracy: 0.9457
# evaluate model
score = model.evaluate(X_test, Y_test, verbose=1)
# print performance
print('Test loss:', score[0])
print('Test accuracy:', score[1])
# look into training history
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.ylabel('model accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='best')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.ylabel('model loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='best')
plt.show()
Step 6: Modify the Hyperparameters to Optimize Performance of the Model
Last, we show how to use the grid search option of scikit-learn (https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
(https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html))
to optimize the
hyperparameters of our model.
First, define a function for crating a DNN:
In [9]:
With epochs = 1 and batch_size = 64, do grid search over the following optimization schemes: ['SGD',
'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam'].
In [34]:
def create_DNN(optimizer=keras.optimizers.Adam()):
model = Sequential()
model.add(Dense(400,input_shape=(img_rows*img_cols,), activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=optimizer,
metrics=['accuracy'])
return model
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
batch_size = 64
epochs = 1
model_gridsearch = KerasClassifier(build_fn=create_DNN,
epochs=epochs, batch_size=batch_size, verbose=1)
# list of allowed optional arguments for the optimizer, see `compile_model()`
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
Epoch 1/1
30000/30000 [==============================] - 3s - loss: 1.3620 - acc
: 0.5839
28928/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 1.4106 - acc
: 0.5763
28800/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 1.3413 - acc
: 0.6017
29056/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 1.3520 - acc
: 0.5856
29632/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4102 - acc
: 0.8784
28736/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4258 - acc
: 0.8723
28608/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4126 - acc
: 0.8776
29312/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4097 - acc
: 0.8779
29824/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.3794 - acc
: 0.8887
29824/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4031 - acc
: 0.8816
29184/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.3895 - acc
: 0.8866
29824/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.3752 - acc
: 0.8911
29632/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.5778 - acc
: 0.8325
29184/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.5900 - acc
: 0.8278
28864/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.5922 - acc
: 0.8268
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
# define parameter dictionary
param_grid = dict(optimizer=optimizer)
# call scikit grid search module
grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=
grid_result = grid.fit(X_train,Y_train)
Show the mean test score of all optimization schemes and determine which scheme gives the best accuracy.
29184/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.5868 - acc
: 0.8266
29952/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4324 - acc
: 0.8721
28864/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4337 - acc
: 0.8714
29440/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4356 - acc
: 0.8717
29568/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4332 - acc
: 0.8727
28608/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4958 - acc
: 0.8500
29312/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4888 - acc
: 0.8595
28928/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4658 - acc
: 0.8621
28736/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4754 - acc
: 0.8601
30000/30000 [==============================] - 1s
Epoch 1/1
30000/30000 [==============================] - 4s - loss: 0.3584 - acc
: 0.8929
28992/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.3585 - acc
: 0.8944
29248/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.3631 - acc
: 0.8917
29824/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.3624 - acc
: 0.8931
28800/30000 [===========================>..] - ETA: 0sEpoch 1/1
40000/40000 [==============================] - 6s - loss: 0.3168 - acc
: 0.9073
In [35]:
2. Create a DNN with one Dense layer having 200 output neurons. Do the grid search over any 5 different
activation functions from https://keras.io/activations/ (https://keras.io/activations/). Let epochs = 1, batches =
64, p_dropout=0.5, and optimizer=keras.optimizers.Adam(). Make sure to print the mean test score of each
case and determine which activation functions gives the best accuracy.
Doing the grid search requires quite a bit of memory. Please restart the kernel ("Kernel"-"Restart") and re-load
the data before doing a new grid search.
In [10]:
Best: 0.951650 using {'optimizer': 'Nadam'}
0.850700 (0.013746) with: {'optimizer': 'SGD'}
0.947125 (0.001248) with: {'optimizer': 'RMSprop'}
0.946550 (0.003741) with: {'optimizer': 'Adagrad'}
0.925900 (0.002684) with: {'optimizer': 'Adadelta'}
0.947200 (0.001200) with: {'optimizer': 'Adam'}
0.934825 (0.002807) with: {'optimizer': 'Adamax'}
0.951650 (0.000865) with: {'optimizer': 'Nadam'}
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
model = Sequential()
def create_DNN(activation):
model = Sequential()
model.add(Dense(200,input_shape=(img_rows*img_cols,), activation=activation))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer='Adam',
metrics=['accuracy'])
return model
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
batch_size = 64
epochs = 1
model_gridsearch = KerasClassifier(build_fn=create_DNN,
epochs=epochs, batch_size=batch_size, verbose=1)
Epoch 1/1
30000/30000 [==============================] - 3s - loss: 0.5012 - acc
: 0.8506
29120/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4967 - acc
: 0.8492
29184/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.4963 - acc
: 0.8530
29376/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.4939 - acc
: 0.8559
29440/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.4717 - acc
: 0.8611
29120/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.4722 - acc
: 0.8604
30000/30000 [==============================] - 0s
Epoch 1/1
30000/30000 [==============================] - 2s - loss: 0.4762 - acc
: 0.8587
29760/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.4679 - acc
: 0.8637
29376/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.4831 - acc
: 0.8567
29952/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.4629 - acc
: 0.8612
28864/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4722 - acc
: 0.8579
28992/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4746 - acc
: 0.8580
29248/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.7890 - acc
: 0.7656
28480/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.7785 - acc
: 0.7720
28992/30000 [===========================>..] - ETA: 0sEpoch 1/1
# list of allowed optional arguments for the optimizer, see `compile_model()`
activation = ['relu', 'tanh', 'elu', 'sigmoid', 'softmax']
# define parameter dictionary
param_grid = dict(activation=activation)
# call scikit grid search module
grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=
grid_result = grid.fit(X_train,Y_train)
In [11]:
30000/30000 [==============================] - 3s - loss: 0.7647 - acc
: 0.7724
29312/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.7746 - acc
: 0.7727
29184/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 1.9335 - acc
: 0.5028
29760/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 1.9408 - acc
: 0.5048
28928/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 1.9342 - acc
: 0.4760
29824/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 1.9398 - acc
: 0.4810
10000/10000 [==============================] - 0s
28928/30000 [===========================>..] - ETA: 0sEpoch 1/1
40000/40000 [==============================] - 4s - loss: 0.4484 - acc
: 0.8681
Best: 0.932725 using {'activation': 'relu'}
0.932725 (0.002930) with: {'activation': 'relu'}
0.912850 (0.003827) with: {'activation': 'tanh'}
0.913725 (0.005063) with: {'activation': 'elu'}
0.895375 (0.005523) with: {'activation': 'sigmoid'}
0.839475 (0.016595) with: {'activation': 'softmax'}
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
3. Now, do the grid search over different combination of batch sizes (10, 30, 50, 100) and number of epochs (1,
2, 5). Make sure to print the mean test score of each case and determine which activation functions gives the
best accuracy. Here, you have a freedom to create your own DNN (assume an arbitrary number of Dense layers,
optimization scheme, etc).
Doing the grid search requires quite a bit of memory. Please restart the kernel ("Kernel"-"Restart") and re-load
the data before doing a new grid search.
Hint: To do the grid search over both batch_size and epochs, you can do:
param_grid = dict(batch_size=batch_size, epochs=epochs)
In [13]:
Epoch 1/1
30000/30000 [==============================] - 14s - loss: 0.3851 - ac
c: 0.8832
29830/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 15s - loss: 0.3897 - ac
c: 0.8836
29960/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 15s - loss: 0.3945 - ac
c: 0.8820
29750/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 17s - loss: 0.3827 - ac
c: 0.8821
29980/30000 [============================>.] - ETA: 0sEpoch 1/2
30000/30000 [==============================] - 16s - loss: 0.3935 - ac
c: 0.8827
Epoch 2/2
30000/30000 [==============================] - 15s - loss: 0.2171 - ac
c: 0.9341
29930/30000 [============================>.] - ETA: 0sEpoch 1/2
model = Sequential()
def create_DNN():
model = Sequential()
model.add(Dense(200,input_shape=(img_rows*img_cols,), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer='Adam',
metrics=['accuracy'])
return model
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
batch_size = 64
epochs = 1
model_gridsearch = KerasClassifier(build_fn=create_DNN,
epochs=epochs, batch_size=batch_size, verbose=1)
# list of allowed optional arguments for the optimizer, see `compile_model()`
batch_size = [10,30,50,100]
epochs = [1,2,5]
# define parameter dictionary
param_grid = dict(batch_size=batch_size, epochs=epochs)
# call scikit grid search module
grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=
grid_result = grid.fit(X_train,Y_train)
In [14]:
4. Do the grid search over the number of neurons in the Dense layer and make a plot of mean test score as a
function of num_neurons. Again, you have a freedom to create your own DNN.
Doing the grid search requires quite a bit of memory. Please restart the kernel ("Kernel"-"Restart") and re-load
the data before doing a new grid search.
In [8]:
Best: 0.967475 using {'batch_size': 10, 'epochs': 5}
0.944350 (0.001052) with: {'batch_size': 10, 'epochs': 1}
0.956850 (0.002544) with: {'batch_size': 10, 'epochs': 2}
0.967475 (0.000476) with: {'batch_size': 10, 'epochs': 5}
0.939225 (0.003904) with: {'batch_size': 30, 'epochs': 1}
0.951700 (0.002840) with: {'batch_size': 30, 'epochs': 2}
0.967075 (0.000536) with: {'batch_size': 30, 'epochs': 5}
0.933700 (0.002487) with: {'batch_size': 50, 'epochs': 1}
0.949700 (0.001885) with: {'batch_size': 50, 'epochs': 2}
0.965475 (0.002815) with: {'batch_size': 50, 'epochs': 5}
0.927900 (0.002847) with: {'batch_size': 100, 'epochs': 1}
0.942250 (0.001410) with: {'batch_size': 100, 'epochs': 2}
0.960925 (0.002025) with: {'batch_size': 100, 'epochs': 5}
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
model = Sequential()
def create_DNN(number):
model = Sequential()
model.add(Dense(number,input_shape=(img_rows*img_cols,), activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer='Adam',
metrics=['accuracy'])
return model
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
batch_size = 64
epochs = 1
Epoch 1/1
30000/30000 [==============================] - 2s - loss: 0.5748 - acc
: 0.8241
29696/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.5448 - acc
: 0.8359
28672/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.5499 - acc
: 0.8358
28864/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 2s - loss: 0.5480 - acc
: 0.8343
30000/30000 [==============================] - 0s
Epoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4889 - acc
: 0.8569
29376/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4796 - acc
: 0.8577
28544/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4856 - acc
: 0.8552
28928/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4777 - acc
: 0.8576
29696/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4517 - acc
: 0.8657
28928/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4479 - acc
: 0.8656
30000/30000 [==============================] - 1s
Epoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4639 - acc
: 0.8632
29824/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 3s - loss: 0.4553 - acc
: 0.8660
29376/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4427 - acc
epochs = 1
model_gridsearch = KerasClassifier(build_fn=create_DNN,
epochs=epochs, batch_size=batch_size, verbose=1)
# list of allowed optional arguments for the optimizer, see `compile_model()`
number = [100, 200, 300, 400, 500, 600, 700, 800]
# define parameter dictionary
param_grid = dict(number=number)
# call scikit grid search module
grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=
grid_result = grid.fit(X_train,Y_train)
: 0.8665
28928/30000 [===========================>..] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4247 - acc
: 0.8745
29504/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4323 - acc
: 0.8713
29632/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 4s - loss: 0.4199 - acc
: 0.8742
29184/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 5s - loss: 0.4214 - acc
: 0.8740
29952/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 5s - loss: 0.4254 - acc
: 0.8748
29376/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 5s - loss: 0.4274 - acc
: 0.8746
30000/30000 [==============================] - 1s
Epoch 1/1
30000/30000 [==============================] - 5s - loss: 0.4251 - acc
: 0.8745
29760/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 6s - loss: 0.4095 - acc
: 0.8773
29824/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 6s - loss: 0.3991 - acc
: 0.8820
29504/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 6s - loss: 0.4172 - acc
: 0.8774
29504/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 6s - loss: 0.4014 - acc
: 0.8802
29120/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 7s - loss: 0.4092 - acc
: 0.8774
29504/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 7s - loss: 0.3995 - acc
: 0.8804
29248/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 7s - loss: 0.3929 - acc
: 0.8843
29632/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 7s - loss: 0.3926 - acc
: 0.8868
29632/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 7s - loss: 0.4041 - acc
: 0.8798
29440/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 7s - loss: 0.4018 - acc
: 0.8804
In [14]:
29824/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 7s - loss: 0.3985 - acc
: 0.8860
29376/30000 [============================>.] - ETA: 0sEpoch 1/1
30000/30000 [==============================] - 7s - loss: 0.3870 - acc
: 0.8854
29248/30000 [============================>.] - ETA: 0sEpoch 1/1
40000/40000 [==============================] - 10s - loss: 0.3510 - ac
c: 0.8956
Best: 0.950675 using {'number': 700}
0.933625 (0.003532) with: {'number': 100}
0.938175 (0.001894) with: {'number': 200}
0.942675 (0.003076) with: {'number': 300}
0.945950 (0.002427) with: {'number': 400}
0.950275 (0.003060) with: {'number': 500}
0.948750 (0.002145) with: {'number': 600}
0.950675 (0.002833) with: {'number': 700}
0.949900 (0.001416) with: {'number': 800}
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
xx = number
yy = []
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
yy.append(mean)
In [15]:
Creating CNNs with Keras
Please restart the kernel ("Kernel"-"Restart") and re-load the data.
We have so far considered each MNIST data sample as a -long 1d vector. This approach neglects
any spatial structure in the image. On the other hand, we do know that in every one of the hand-written digits
there are local spatial correlations between the pixels, which we would like to take advantage of to improve the
accuracy of our classification model. To this end, we first need to reshape the training and test input data as
follows
(28 × 28, )
plt.plot(xx,yy)
plt.xlabel('The nubmer of neurons')
plt.ylabel('Mean test score')
plt.show()
In [22]:
One can ask the question of whether a neural net can learn to recognize such local patterns. This can be
achieved by using convolutional layers. Luckily, all we need to do is change the architecture of our DNN.
After we instantiate the model, add the first convolutional layer with 10 filters, which is the dimensionality of
output space. (https://keras.io/layers/convolutional/ (https://keras.io/layers/convolutional/)) Here, we will be
concerned with local spatial filters that take as inputs a small spatial patch of the previous layer at all depths.
We consider a three-dimensional kernel of size . Check out this visualization of the convolution
procedure for a square input of unit depth:
https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
(https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) The convolution consists of running
this filter over all locations in the spatial plane. After computing the filter, the output is passed through a nonlinearity,
a ReLU.
5 × 5 × 1
X_train shape: (40000, 28, 28, 1)
Y_train shape: (40000,)
40000 train samples
10000 test samples
# reshape data, depending on Keras backend
if keras.backend.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
print('X_train shape:', X_train.shape)
print('Y_train shape:', Y_train.shape)
print()
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
In [23]:
Subsequently, add a 2D pooling layer. (https://keras.io/layers/pooling/ (https://keras.io/layers/pooling/)) This
pooling layer coarse-grain spatial information by performing a subsampling at each depth. Here, we use the
the max pool operation. In a max pool, the spatial dimensions are coarse-grained by replacing a small region
(say 2 × 2 neurons) by a single neuron whose output is the maximum value of the output in the region.
In [24]:
Add another convolutional layers with 20 filters and apply dropout. Then, add another pooling layer and flatten
the data. You can do DNNs afterwards and compile the model.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
model = Sequential()
model.add(Conv2D(10, kernel_size=(5, 5),
activation='relu',
input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
In [25]:
Lastly, train your CNN and evaluate the model.
In [27]:
----------------------------------------------------------------------
# add second convolutional layer with 20 filters
model.add(Conv2D(20, (5, 5), activation='relu'))
# apply dropout with rate 0.5
model.add(Dropout(0.5))
# add 2D pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))
# flatten data
model.add(Flatten())
# add a dense all-to-all relu layer
model.add(Dense(20*4*4, activation='relu'))
# apply dropout with rate 0.5
model.add(Dropout(0.5))
# soft-max layer
model.add(Dense(num_classes, activation='softmax'))
# compile the model
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer='Adam',
metrics=['accuracy'])
# training parameters
batch_size = 64
epochs = 10
# train CNN
model.fit(X_train, Y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(X_test, Y_test))
# evaliate model
score = model_CNN.evaluate(X_test, Y_test, verbose=1)
# print performance
print()
print('Test loss:', score[0])
print('Test accuracy:', score[1])
-----
ValueError Traceback (most recent call
last)
<ipython-input-27-4c64ca8f5efa> in <module>
9 epochs=epochs,
10 verbose=1,
---> 11 validation_data=(X_test, Y_test))
12
13 # evaliate model
/srv/app/venv/lib/python3.6/site-packages/keras/models.py in fit(self,
x, y, batch_size, epochs, verbose, callbacks, validation_split, valida
tion_data, shuffle, class_weight, sample_weight, initial_epoch, **kwar
gs)
865 class_weight=class_weight,
866 sample_weight=sample_weight,
--> 867 initial_epoch=initial_epoch)
868
869 def evaluate(self, x, y, batch_size=32, verbose=1,
/srv/app/venv/lib/python3.6/site-packages/keras/engine/training.py in
fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_spl
it, validation_data, shuffle, class_weight, sample_weight, initial_epo
ch, steps_per_epoch, validation_steps, **kwargs)
1520 class_weight=class_weight,
1521 check_batch_axis=False,
-> 1522 batch_size=batch_size)
1523 # Prepare validation data.
1524 do_validation = False
/srv/app/venv/lib/python3.6/site-packages/keras/engine/training.py in
_standardize_user_data(self, x, y, sample_weight, class_weight, check_
batch_axis, batch_size)
1380 output_shapes,
1381 check_batch_axis=False,
-> 1382 exception_prefix='target')
1383 sample_weights = _standardize_sample_weights(sample_we
ight,
1384
self._feed_output_names)
/srv/app/venv/lib/python3.6/site-packages/keras/engine/training.py in
_standardize_input_data(data, names, shapes, check_batch_axis, excepti
on_prefix)
142 ' to have shape ' + str(shapes[i])
+
143 ' but got array with shape ' +
--> 144 str(array.shape))
145 return arrays
146
ValueError: Error when checking target: expected dense_6 to have shape
(None, 10) but got array with shape (40000, 1)
5. Do the grid search over any 3 different optimization schemes and 2 activation functions. Suppose that we
have a 2 convolutional layers with 10 neurons. Let p_dropout = 0.5, epochs = 1, and batch_size = 64.
Determine which combination of optimization scheme, activation function, and number of neurons gives the
best accuracy.
Doing the grid search requires quite a bit of memory. Please restart the kernel ("Kernel"-"Restart") and re-load
the data before doing a new grid search.
In [ ]:
6. Create an arbitrary DNN (you are free to choose any activation function, optimization scheme, etc) and
evaluate its performance. Then, add two convolutional layers and pooling layers and evaluate its performance
again. How do they compare?
In [ ]:
Problem 2 - Using Tensorflow - Ising Model
You should restart the kernel for Problem 2.
Next, we show how one can use deep neural nets to classify the states of the 2D Ising model according to their
phase. This should be compared with the use of logistic-regression in HW8.
The Hamiltonian for the classical Ising model is given by
where the lattice site indices run over all nearest neighbors of a 2D square lattice, and is some arbitrary
interaction energy scale. We adopt periodic boundary conditions. Onsager proved that this model undergoes a
phase transition in the thermodynamic limit from an ordered ferromagnet with all spins aligned to a disordered
phase at the critical temperature . For any finite system size, this critical point
is expanded to a critical region around .
H = J ∑ , ∈ {±1}
ij
SiSj Sj
i, j J
Tc /J = 2/ log(1 + 2) ≈ 2.26 √
Tc
...
...
Step 1: Load and Process the Data
We begin by writing a DataSet class and two functions read_data_sets and load_data to process
the 2D Ising data.
The DataSet class performs checks on the data shape and casts the data into the correct data type for the
calculation. It contains a function method called next_batch which shuffles the data and returns a minibatch
of a pre-defined size. This structure is particularly useful for the training procedure in TensorFlow.
In [5]:
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function
import numpy as np
seed=12
np.random.seed(seed)
import sys, os, argparse
import tensorflow as tf
from tensorflow.python.framework import dtypes
# suppress tflow compilation warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.set_random_seed(seed)
In [6]:
class DataSet(object):
def __init__(self,data_X,data_Y,dtype=dtypes.float32):
"""Checks data and casts it into correct data type. """
dtype = dtypes.as_dtype(dtype).base_dtype
if dtype not in (dtypes.uint8, dtypes.float32):
raise TypeError('Invalid dtype %r, expected uint8 or float32' % dtype)
assert data_X.shape[0] == data_Y.shape[0], ('data_X.shape: %s data_Y.shape: %s'
self.num_examples = data_X.shape[0]
if dtype == dtypes.float32:
data_X = data_X.astype(np.float32)
self.data_X = data_X
self.data_Y = data_Y
self.epochs_completed = 0
self.index_in_epoch = 0
def next_batch(self, batch_size, seed=None):
"""Return the next `batch_size` examples from this data set."""
if seed:
np.random.seed(seed)
start = self.index_in_epoch
self.index_in_epoch += batch_size
if self.index_in_epoch > self.num_examples:
# Finished epoch
self.epochs_completed += 1
# Shuffle the data
perm = np.arange(self.num_examples)
np.random.shuffle(perm)
self.data_X = self.data_X[perm]
self.data_Y = self.data_Y[perm]
# Start next epoch
start = 0
self.index_in_epoch = batch_size
assert batch_size <= self.num_examples
end = self.index_in_epoch
return self.data_X[start:end], self.data_Y[start:end]
Now, load the Ising dataset, and splits it into three subsets: ordered, critical and disordered, depending on the
temperature which sets the distribution they are drawn from. Once again, we use the ordered and disordered
data to create a training and a test data set for the problem. Classifying the states in the critical region is
expected to be harder and we only use this data to test the performance of our model in the end.
In [7]:
import pickle
from sklearn.model_selection import train_test_split
from keras.utils import to_categorical
import collections
L=40 # linear system size
# load data
fac = 25
file_name = "Ising2DFM_reSample_L40_T=All.pkl" # this file contains 16*10000 samples taken in T=np.arange(0.25,4.0001,0.25)
data = pickle.load(open(file_name,'rb')) # pickle reads the file and returns the Python object (1D array, compressed bits)
data = data[::fac]
data = np.unpackbits(data).reshape(-1, 1600) # Decompress array and reshape for convenience
data=data.astype('int')
data[np.where(data==0)]=-1 # map 0 state to -1 (Ising variable can take values +/-1)
file_name = "Ising2DFM_reSample_L40_T=All_labels.pkl" # this file contains 16*10000 samples taken in T=np.arange(0.25,4.0001,0.25)
labels = pickle.load(open(file_name,'rb')) # pickle reads the file and returns the Python object (here just a 1D array with the binary labels)
# divide data into ordered, critical and disordered
X_ordered=data[:int(70000/fac),:]
Y_ordered=labels[:70000][::fac]
X_critical=data[int(70000/fac):int(100000/fac),:]
Y_critical=labels[70000:100000][::fac]
X_disordered=data[int(100000/fac):,:]
Y_disordered=labels[100000:][::fac]
del data,labels
# define training and test data sets
X=np.concatenate((X_ordered,X_disordered)) #np.concatenate((X_ordered,X_critical,X_disordered))
Y=np.concatenate((Y_ordered,Y_disordered)) #np.concatenate((Y_ordered,Y_critical,Y_disordered))
del X_ordered, X_disordered, Y_ordered, Y_disordered
In [8]:
You can load the training data in the following way: (Dataset.train.data_X, Dataset.train.data_Y).
Steps 2+3: Define the Neural Net and its Architecture, Choose the Optimizer and
the Cost Function
We can now move on to construct our deep neural net using TensorFlow.
Unique for TensorFlow is creating placeholders for the variables of the model, such as the feed-in data X and
Y or the dropout probability dropout_keepprob (which has to be set to unity explicitly during testing).
Another peculiarity is using the with scope to give names to the most important operators. While we do not
discuss this here, TensorFlow also allows one to visualise the computational graph for the model (see package
documentation on https://www.tensorflow.org/ (https://www.tensorflow.org/)).
The shape of X is only partially defined. We know that it will be a matrix, with instances along the first
dimension and features along the second dimension, and we know that the number of features is going to be
, but we don't know yet how many instances each training batch will contain. So the shape of X is
(None, n_inputs). Similarly, we know that Y will be a vector with one entry per instance, but again we don't
know the size of the training batch, so the shape is (None).
28 × 28
# pick random data points from ordered and disordered states to create the training and test sets
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,train_size=0.6)
# make data categorical
Y_train=to_categorical(Y_train)
Y_test=to_categorical(Y_test)
Y_critical=to_categorical(Y_critical)
# create data sets
train = DataSet(X_train, Y_train, dtype=dtypes.float32)
test = DataSet(X_test, Y_test, dtype=dtypes.float32)
critical = DataSet(X_critical, Y_critical, dtype=dtypes.float32)
Datasets = collections.namedtuple('Datasets', ['train', 'test', 'critical'])
Dataset = Datasets(train=train, test=test, critical=critical)
In [9]:
To classify whether a given spin configuration is in the ordered or disordered phase, we construct a
minimalistic model for a DNN with a single hidden layer containing (which is kept variable so we can
try out the performance of different sizes for the hidden layer).
Let us use a neuron_layer() function to create layers in the neural nets.
1. First, create a name scope using the name of the layer.
2. Get the number of inputs by looking up the input matrix's shape and getting the size of the second
dimension.
3. Create a variable which holds the weight matrix (i.e. kernel). Initialize it randomly, using a truncated
normal distribution.
4. Create a variable for biases, initialized to 0.
5. Create a subgraph to compute
6. Use activation function if provided.
Nneurons
W
b
Z = XW + b
In [10]:
L=40 # system linear size
n_feats=L**2 # 40x40 square lattice
n_categories=2 # 2 Ising phases: ordered and disordered
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 2
with tf.name_scope('data'):
X=tf.placeholder(tf.float32, shape=(None,n_feats))
Y=tf.placeholder(tf.float32, shape=(None,n_categories))
dropout_keepprob=tf.placeholder(tf.float32)
def neuron_layer(X, n_neuron, name, activation = None):
with tf.name_scope(name):
n_inputs = int(X.get_shape()[1])
stddev = 2 / np.sqrt(n_inputs + n_neuron)
init = tf.truncated_normal((n_inputs, n_neuron), stddev = stddev)
W = tf.Variable(init, name = "kernel")
b = tf.Variable(tf.zeros([n_neuron]), name = "bias")
Z = tf.matmul(X, W) + b
if activation is not None:
return activation(Z)
else:
return Z
Using a neuron_layer() function, create two hidden layers and an output layer. The first hidden layer takes X as
its input, and the second takes the output of the first hidden layer as its input. Finally, the output layer takes the
output of the second hidden layer as its input.
In [11]:
Then, define the cost function that we will use to train the neural net model. Here, use the cross entropy to
penalize models that estimate a low probability for the target class.
In [12]:
Then, define a GradientDescentOptimizer that will tweak the model parameters to minimize the cost function.
Now, set learning_rate = 1e-6.
In [13]:
Lastly, specify how to evaluate the model. Let us simply use accuracy as our performance measure.
In [14]:
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, n_hidden1, activation = tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, n_hidden2, activation = tf.nn.relu)
logits = tf.layers.dense(hidden2, n_outputs)
with tf.name_scope('loss'):
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = logits)
loss = tf.reduce_mean(xentropy)
learning_rate = 1e-6
with tf.name_scope('optimiser'):
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(logits, 1))
correct_prediction = tf.cast(correct_prediction, tf.float64) # change data type
# correct_prediction = tf.nn.in_top_k(logits, Y, 1)
accuracy = tf.reduce_mean(correct_prediction)
Steps 4+5: Train the Model and Evaluate its Performance
We train our DNN using mini-batches of size over a total of epochs, which we define first. We then
set up the optimizer parameter dictionary opt_params , and use it to create a DNN model.
Running TensorFlow requires opening up a Session which we abbreviate as sess for short. All operations
are performed in this session by calling the run method. First, we initialize the global variables in
TensorFlow's computational graph by running the global_variables_initializer . To train the DNN,
we loop over the number of epochs. In each fix epoch, we use the next_batch function of the DataSet
class we defined above to create a mini-batch. The forward and backward passes through the weights are
performed by running the loss and optimizer methods. To pass the mini-batch as well as any other
external parameters, we use the feed_dict dictionary. Similarly, we evaluate the model performance, by
getting accuracy on the same minibatch data. Note that the dropout probability for testing is set to unity.
Once we have exhausted all training epochs, we test the final performance on the entire training, test and
critical data sets. This is done in the same way as above.
Last, we return the loss and accuracy for each of the training, test and critical data sets.
100 100
In [15]:
train loss/accuracy: 0.87729853 0.5048076923076923
test loss/accuracy: 0.8700542 0.5192307692307693
crtitical loss/accuracy: 0.8785669 0.4975
training_epochs=100
batch_size=100
with tf.Session() as sess:
# initialize the necessary variables, in this case, w and b
sess.run(tf.global_variables_initializer())
# train the DNN
for epoch in range(training_epochs):
batch_X, batch_Y = Dataset.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={X: batch_X,Y: batch_Y,dropout_keepprob: 0.5})
# test DNN performance on entire train test and critical data sets
train_loss, train_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.train.data_X,
Y: Dataset.train.data_Y,
dropout_keepprob: 0.5}
)
print("train loss/accuracy:", train_loss, train_accuracy)
test_loss, test_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.test.data_X,
Y: Dataset.test.data_Y,
dropout_keepprob: 1.0}
)
print("test loss/accuracy:", test_loss, test_accuracy)
critical_loss, critical_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.critical.data_X
Y: Dataset.critical.data_Y
dropout_keepprob: 1.0}
)
print("crtitical loss/accuracy:", critical_loss, critical_accuracy)
Step 6: Modify the Hyperparameters to Optimize Performance of the Model
To study the dependence of our DNN on some of the hyperparameters, we do a grid search over the number of
neurons (initially set as 100) in the hidden layer, and different SGD learning rates (initially set as 1e-6). These
searches are best done over logarithmically-spaced points.
To do this, define a function for creating a DNN model: create_DNN and for evaluating the performance:
evaluate_model .
The function grid_search will output 2D heat map to show how accuracy changes with learning rate and
number of neurons.
In [16]:
def create_DNN(n_hidden1=100, n_hidden2=100, learning_rate=1e-6):
with tf.name_scope('data'):
X=tf.placeholder(tf.float32, shape=(None,n_feats))
Y=tf.placeholder(tf.float32, shape=(None,n_categories))
dropout_keepprob=tf.placeholder(tf.float32)
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, n_hidden1, activation = tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, n_hidden2, activation = tf.nn.relu)
logits = tf.layers.dense(hidden2, n_outputs)
with tf.name_scope('loss'):
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = logits
loss = tf.reduce_mean(xentropy)
with tf.name_scope('optimiser'):
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(logits, 1))
correct_prediction = tf.cast(correct_prediction, tf.float64) # change data type
# correct_prediction = tf.nn.in_top_k(logits, Y, 1)
accuracy = tf.reduce_mean(correct_prediction)
return X, Y, dropout_keepprob, loss, optimizer, accuracy
In [17]:
def evaluate_model(neurons,lr):
training_epochs=100
batch_size=100
X, Y, dropout_keepprob, loss, optimizer, accuracy = create_DNN(n_hidden1=neurons
with tf.Session() as sess:
# initialize the necessary variables, in this case, w and b
sess.run(tf.global_variables_initializer())
# train the DNN
for epoch in range(training_epochs):
batch_X, batch_Y = Dataset.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={X: batch_X,Y: batch_Y,dropout_keepprob: 0.5
# test DNN performance on entire train test and critical data sets
train_loss, train_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.train.data_X
Y: Dataset.train.data_Y
dropout_keepprob: 0.5
)
print("train loss/accuracy:", train_loss, train_accuracy)
test_loss, test_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.test.data_X
Y: Dataset.test.data_Y
dropout_keepprob: 1.0
)
print("test loss/accuracy:", test_loss, test_accuracy)
critical_loss, critical_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.critical.data_X
Y: Dataset.critical.data_Y
dropout_keepprob: 1.0
)
print("crtitical loss/accuracy:", critical_loss, critical_accuracy)
return train_loss,train_accuracy,test_loss,test_accuracy,critical_loss,critical_accuracy
In [18]:
def grid_search():
"""This function performs a grid search over a set of different learning rates
and a number of hidden layer neurons."""
# perform grid search over learnign rate and number of hidden neurons
N_neurons=[100, 200, 300, 400, 500]
learning_rates=np.logspace(-6,-1,6)
# pre-alocate variables to store accuracy and loss data
train_loss=np.zeros((len(N_neurons),len(learning_rates)),dtype=np.float64)
train_accuracy=np.zeros_like(train_loss)
test_loss=np.zeros_like(train_loss)
test_accuracy=np.zeros_like(train_loss)
critical_loss=np.zeros_like(train_loss)
critical_accuracy=np.zeros_like(train_loss)
# do grid search
for i, neurons in enumerate(N_neurons):
for j, lr in enumerate(learning_rates):
print("training DNN with %4d neurons and SGD lr=%0.6f." %(neurons,lr) )
train_loss[i,j],train_accuracy[i,j],\
test_loss[i,j],test_accuracy[i,j],\
critical_loss[i,j],critical_accuracy[i,j] = evaluate_model(neurons,lr)
plot_data(learning_rates,N_neurons,train_accuracy, "training data")
plot_data(learning_rates,N_neurons,test_accuracy, "test data")
plot_data(learning_rates,N_neurons,critical_accuracy, "critical data")
In [19]:
In [20]:
training DNN with 100 neurons and SGD lr=0.000001.
train loss/accuracy: 0.8261823 0.5150641025641025
test loss/accuracy: 0.82548255 0.5264423076923077
crtitical loss/accuracy: 0.823629 0.5083333333333333
training DNN with 100 neurons and SGD lr=0.000010.
train loss/accuracy: 0.7448887 0.5432692307692307
%matplotlib notebook
import matplotlib.pyplot as plt
def plot_data(x,y,data, title):
# plot results
fontsize=16
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)
fig.colorbar(cax)
# put text on matrix elements
for i, x_val in enumerate(np.arange(len(x))):
for j, y_val in enumerate(np.arange(len(y))):
c = "${0:.1f}\\%$".format( 100*data[j,i])
ax.text(x_val, y_val, c, va='center', ha='center')
# convert axis vaues to to string labels
x=[str(i) for i in x]
y=[str(i) for i in y]
ax.set_xticklabels(['']+x)
ax.set_yticklabels(['']+y)
ax.set_xlabel('$\\mathrm{learning\\ rate}$',fontsize=fontsize)
ax.set_ylabel('$\\mathrm{hidden\\ neurons}$',fontsize=fontsize)
ax.set_title(title,fontsize=fontsize)
plt.tight_layout()
plt.show()
grid_search()
test loss/accuracy: 0.7627646 0.5201923076923077
crtitical loss/accuracy: 0.70823133 0.57
training DNN with 100 neurons and SGD lr=0.000100.
train loss/accuracy: 0.7658607 0.5166666666666667
test loss/accuracy: 0.77821404 0.5153846153846153
crtitical loss/accuracy: 0.7970336 0.49083333333333334
training DNN with 100 neurons and SGD lr=0.001000.
train loss/accuracy: 0.66140896 0.5939102564102564
test loss/accuracy: 0.6775087 0.5822115384615385
crtitical loss/accuracy: 0.72388476 0.5258333333333334
training DNN with 100 neurons and SGD lr=0.010000.
train loss/accuracy: 0.22135256 0.9724358974358974
test loss/accuracy: 0.26849487 0.9456730769230769
crtitical loss/accuracy: 0.48565084 0.7841666666666667
training DNN with 100 neurons and SGD lr=0.100000.
train loss/accuracy: 0.0065746326 1.0
test loss/accuracy: 0.032350723 0.99375
crtitical loss/accuracy: 0.46474758 0.8083333333333333
training DNN with 200 neurons and SGD lr=0.000001.
train loss/accuracy: 0.81956315 0.48205128205128206
test loss/accuracy: 0.83524406 0.47307692307692306
crtitical loss/accuracy: 0.7584602 0.5358333333333334
training DNN with 200 neurons and SGD lr=0.000010.
train loss/accuracy: 1.0758603 0.5407051282051282
test loss/accuracy: 1.1356921 0.5158653846153847
crtitical loss/accuracy: 0.84251535 0.65
training DNN with 200 neurons and SGD lr=0.000100.
train loss/accuracy: 0.81050164 0.5112179487179487
test loss/accuracy: 0.82746327 0.5033653846153846
crtitical loss/accuracy: 0.7245159 0.5866666666666667
training DNN with 200 neurons and SGD lr=0.001000.
train loss/accuracy: 0.68042535 0.6128205128205129
test loss/accuracy: 0.69521517 0.5947115384615385
crtitical loss/accuracy: 0.6811237 0.6025
training DNN with 200 neurons and SGD lr=0.010000.
train loss/accuracy: 0.1754437 0.9875
test loss/accuracy: 0.223725 0.95625
crtitical loss/accuracy: 0.4759194 0.7783333333333333
training DNN with 200 neurons and SGD lr=0.100000.
train loss/accuracy: 0.0050047515 1.0
test loss/accuracy: 0.016949823 0.9980769230769231
crtitical loss/accuracy: 0.37466472 0.8475
training DNN with 300 neurons and SGD lr=0.000001.
train loss/accuracy: 1.0666369 0.4512820512820513
test loss/accuracy: 1.0449538 0.45913461538461536
crtitical loss/accuracy: 1.1571132 0.365
training DNN with 300 neurons and SGD lr=0.000010.
train loss/accuracy: 1.2051065 0.4532051282051282
test loss/accuracy: 1.1894196 0.4625
crtitical loss/accuracy: 1.4901551 0.3383333333333333
training DNN with 300 neurons and SGD lr=0.000100.
train loss/accuracy: 0.7737522 0.49455128205128207
test loss/accuracy: 0.7733282 0.4932692307692308
crtitical loss/accuracy: 0.7616569 0.5058333333333334
training DNN with 300 neurons and SGD lr=0.001000.
train loss/accuracy: 0.66356003 0.6256410256410256
test loss/accuracy: 0.68873733 0.604326923076923
crtitical loss/accuracy: 0.69861454 0.5808333333333333
training DNN with 300 neurons and SGD lr=0.010000.
train loss/accuracy: 0.17167042 0.9887820512820513
test loss/accuracy: 0.21518518 0.9629807692307693
crtitical loss/accuracy: 0.4357977 0.8208333333333333
training DNN with 300 neurons and SGD lr=0.100000.
train loss/accuracy: 0.0050944365 1.0
test loss/accuracy: 0.014383625 0.9990384615384615
crtitical loss/accuracy: 0.3503038 0.8433333333333334
training DNN with 400 neurons and SGD lr=0.000001.
train loss/accuracy: 1.2070509 0.45
test loss/accuracy: 1.172588 0.4735576923076923
crtitical loss/accuracy: 1.4149262 0.33666666666666667
training DNN with 400 neurons and SGD lr=0.000010.
train loss/accuracy: 0.8229826 0.492948717948718
test loss/accuracy: 0.83668387 0.4894230769230769
crtitical loss/accuracy: 0.83969796 0.46416666666666667
training DNN with 400 neurons and SGD lr=0.000100.
train loss/accuracy: 0.7795468 0.5051282051282051
test loss/accuracy: 0.77486354 0.5081730769230769
crtitical loss/accuracy: 0.74270254 0.5308333333333334
training DNN with 400 neurons and SGD lr=0.001000.
train loss/accuracy: 0.6276471 0.6432692307692308
test loss/accuracy: 0.65891 0.6048076923076923
crtitical loss/accuracy: 0.6667839 0.6041666666666666
training DNN with 400 neurons and SGD lr=0.010000.
train loss/accuracy: 0.16308242 0.9881410256410257
test loss/accuracy: 0.1974595 0.9735576923076923
crtitical loss/accuracy: 0.42789352 0.82
training DNN with 400 neurons and SGD lr=0.100000.
train loss/accuracy: 0.0045278403 1.0
test loss/accuracy: 0.013034376 0.9985576923076923
crtitical loss/accuracy: 0.36393955 0.845
training DNN with 500 neurons and SGD lr=0.000001.
train loss/accuracy: 0.8089994 0.49230769230769234
test loss/accuracy: 0.8481708 0.4673076923076923
crtitical loss/accuracy: 0.74246925 0.5383333333333333
training DNN with 500 neurons and SGD lr=0.000010.
train loss/accuracy: 0.7743357 0.5041666666666667
test loss/accuracy: 0.7632512 0.5221153846153846
crtitical loss/accuracy: 0.7739148 0.5258333333333334
training DNN with 500 neurons and SGD lr=0.000100.
train loss/accuracy: 0.77521557 0.5048076923076923
test loss/accuracy: 0.78184414 0.5004807692307692
crtitical loss/accuracy: 0.74329823 0.5258333333333334
training DNN with 500 neurons and SGD lr=0.001000.
train loss/accuracy: 0.6239121 0.6592948717948718
test loss/accuracy: 0.62496156 0.6615384615384615
crtitical loss/accuracy: 0.6445728 0.6316666666666667
training DNN with 500 neurons and SGD lr=0.010000.
train loss/accuracy: 0.14266635 0.9942307692307693
test loss/accuracy: 0.18611857 0.9764423076923077
crtitical loss/accuracy: 0.43260914 0.8108333333333333
training DNN with 500 neurons and SGD lr=0.100000.
train loss/accuracy: 0.0048845564 1.0
test loss/accuracy: 0.012268903 0.9990384615384615
crtitical loss/accuracy: 0.35715875 0.825
1. Do the grid search over 5 different types of activation functions
(https://www.tensorflow.org/api_guides/python/nn#Activation_Functions)
(https://www.tensorflow.org/api_guides/python/nn#Activation_Functions)). Evaluate the performance for each
case and determine which gives the best accuracy. You can assume an arbitrary DNN. Show results for training,
test, and critical data.
In [33]:
def create_DNN(activation):
with tf.name_scope('data'):
X=tf.placeholder(tf.float32, shape=(None,n_feats))
Y=tf.placeholder(tf.float32, shape=(None,n_categories))
dropout_keepprob=tf.placeholder(tf.float32)
with tf.name_scope("dnn"):
if activation == 0:
hidden1 = tf.layers.dense(X, 100, activation = tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.relu)
elif activation == 1:
hidden1 = tf.layers.dense(X, 100, activation = tf.nn.relu6)
hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.relu6)
elif activation == 2:
hidden1 = tf.layers.dense(X, 100, activation = tf.nn.crelu)
hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.crelu)
elif activation == 3:
hidden1 = tf.layers.dense(X, 100, activation = tf.nn.elu)
hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.elu)
elif activation == 4:
hidden1 = tf.layers.dense(X, 100, activation = tf.nn.tanh)
hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.tanh)
logits = tf.layers.dense(hidden2, n_outputs)
with tf.name_scope('loss'):
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = logits
loss = tf.reduce_mean(xentropy)
with tf.name_scope('optimiser'):
optimizer = tf.train.GradientDescentOptimizer(1e-6).minimize(loss)
with tf.name_scope('accuracy'):
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(logits, 1))
correct_prediction = tf.cast(correct_prediction, tf.float64) # change data type
# correct_prediction = tf.nn.in_top_k(logits, Y, 1)
accuracy = tf.reduce_mean(correct_prediction)
return X, Y, dropout_keepprob, loss, optimizer, accuracy
def evaluate_model(activation):
training_epochs=100
batch_size=100
X, Y, dropout_keepprob, loss, optimizer, accuracy = create_DNN(activation)
with tf.Session() as sess:
# initialize the necessary variables, in this case, w and b
sess.run(tf.global_variables_initializer())
# train the DNN
for epoch in range(training_epochs):
batch_X, batch_Y = Dataset.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={X: batch_X,Y: batch_Y,dropout_keepprob: 0.5
# test DNN performance on entire train test and critical data sets
train_loss, train_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.train.data_X
Y: Dataset.train.data_Y
dropout_keepprob: 0.5
)
print("train loss/accuracy:", train_loss, train_accuracy)
test_loss, test_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.test.data_X
Y: Dataset.test.data_Y
dropout_keepprob: 1.0
)
print("test loss/accuracy:", test_loss, test_accuracy)
critical_loss, critical_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.critical.data_X
Y: Dataset.critical.data_Y
dropout_keepprob: 1.0
)
print("crtitical loss/accuracy:", critical_loss, critical_accuracy)
return train_loss,train_accuracy,test_loss,test_accuracy,critical_loss,critical_accuracy
activation=['relu', 'relu6', 'crelu', 'elu', 'tanh']
activation=['relu', 'relu6', 'crelu', 'elu', 'tanh']
def grid_search():
"""This function performs a grid search over a set of different learning rates
and a number of hidden layer neurons."""
# perform grid search over learnign rate and number of hidden neurons
activation=['relu', 'relu6', 'crelu', 'elu', 'tanh']
# pre-alocate variables to store accuracy and loss data
train_loss=np.zeros(len(activation))
train_accuracy=np.zeros_like(train_loss)
test_loss=np.zeros_like(train_loss)
test_accuracy=np.zeros_like(train_loss)
critical_loss=np.zeros_like(train_loss)
critical_accuracy=np.zeros_like(train_loss)
# do grid search
for i in range(np.size(activation)):
print("training DNN with %s: " %(activation[i]) )
train_loss[i],train_accuracy[i],test_loss[i],test_accuracy[i],critical_loss[
print()
temp = critical_accuracy.tolist()
temp = temp.index(max(temp))
print(activation[temp], 'gives the best accuracy, and the best accuracy is: ',critical_accuracy
In [34]:
2. Do the grid search over 5 different numbers of epochs and batch sizes. Make a 2D heat map as shown in the
example. You can assume an arbitrary DNN. Show results for training, test, and critical data.
In [43]:
training DNN with relu:
train loss/accuracy: 0.759728 0.5301282051282051
test loss/accuracy: 0.7460091 0.5379807692307692
crtitical loss/accuracy: 0.75928587 0.5266666666666666
training DNN with relu6:
train loss/accuracy: 0.8069537 0.525
test loss/accuracy: 0.8278656 0.49759615384615385
crtitical loss/accuracy: 0.7280958 0.5775
training DNN with crelu:
train loss/accuracy: 1.2907476 0.5477564102564103
test loss/accuracy: 1.3735441 0.5254807692307693
crtitical loss/accuracy: 1.0095135 0.6658333333333334
training DNN with elu:
train loss/accuracy: 0.90077466 0.5060897435897436
test loss/accuracy: 0.9300092 0.48846153846153845
crtitical loss/accuracy: 0.85720867 0.525
training DNN with tanh:
train loss/accuracy: 0.82476604 0.4987179487179487
test loss/accuracy: 0.8346278 0.49615384615384617
crtitical loss/accuracy: 0.7826468 0.5225
crelu gives the best accuracy, and the best accuracy is: 0.6658333333
333334
grid_search()
#question 2
def create_DNN():
with tf.name_scope('data'):
X=tf.placeholder(tf.float32, shape=(None,n_feats))
Y=tf.placeholder(tf.float32, shape=(None,n_categories))
dropout_keepprob=tf.placeholder(tf.float32)
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, 100, activation = tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.relu)
logits = tf.layers.dense(hidden2, n_outputs)
with tf.name_scope('loss'):
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = logits
loss = tf.reduce_mean(xentropy)
with tf.name_scope('optimiser'):
optimizer = tf.train.GradientDescentOptimizer(1e-6).minimize(loss)
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(logits, 1))
correct_prediction = tf.cast(correct_prediction, tf.float64) # change data type
# correct_prediction = tf.nn.in_top_k(logits, Y, 1)
accuracy = tf.reduce_mean(correct_prediction)
return X, Y, dropout_keepprob, loss, optimizer, accuracy
def evaluate_model(training_epochs,batch_size):
X, Y, dropout_keepprob, loss, optimizer, accuracy = create_DNN()
with tf.Session() as sess:
# initialize the necessary variables, in this case, w and b
sess.run(tf.global_variables_initializer())
# train the DNN
for epoch in range(training_epochs):
batch_X, batch_Y = Dataset.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={X: batch_X,Y: batch_Y,dropout_keepprob: 0.5
# test DNN performance on entire train test and critical data sets
train_loss, train_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.train.data_X
Y: Dataset.train.data_Y
dropout_keepprob: 0.5
)
print("train loss/accuracy:", train_loss, train_accuracy)
test_loss, test_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.test.data_X
Y: Dataset.test.data_Y
dropout_keepprob: 1.0
)
print("test loss/accuracy:", test_loss, test_accuracy)
critical_loss, critical_accuracy = sess.run([loss, accuracy],
feed_dict={X: Dataset.critical.data_X
Y: Dataset.critical.data_Y
Y: Dataset.critical.data_Y
dropout_keepprob: 1.0
)
print("crtitical loss/accuracy:", critical_loss, critical_accuracy)
return train_loss,train_accuracy,test_loss,test_accuracy,critical_loss,critical_accuracy
def grid_search():
"""This function performs a grid search over a set of different learning rates
and a number of hidden layer neurons."""
# perform grid search over learnign rate and number of hidden neurons
training_epochs = [100, 200, 300, 400, 500, 600]
batch_size = [100, 200, 300, 400, 500, 600]
# pre-alocate variables to store accuracy and loss data
train_loss=np.zeros((len(training_epochs),len(batch_size)),dtype=np.float64)
train_accuracy=np.zeros_like(train_loss)
test_loss=np.zeros_like(train_loss)
test_accuracy=np.zeros_like(train_loss)
critical_loss=np.zeros_like(train_loss)
critical_accuracy=np.zeros_like(train_loss)
# do grid search
for i, trainingepochs in enumerate(training_epochs):
for j, batchsize in enumerate(batch_size):
print("training DNN with %d training epochs and SGD batch size=%d." %(trainingepochs
train_loss[i,j],train_accuracy[i,j],\
test_loss[i,j],test_accuracy[i,j],\
critical_loss[i,j],critical_accuracy[i,j] = evaluate_model(trainingepochs
print()
plot_data(batch_size,training_epochs,train_accuracy, "training data")
plot_data(batch_size,training_epochs,test_accuracy, "test data")
plot_data(batch_size,training_epochs,critical_accuracy, "critical data")
%matplotlib notebook
import matplotlib.pyplot as plt
def plot_data(x,y,data, title):
# plot results
fontsize=16
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)
fig.colorbar(cax)
In [44]:
training DNN with 100 training epochs and SGD batch size=100.
train loss/accuracy: 0.9047832 0.521474358974359
test loss/accuracy: 0.92370594 0.5033653846153846
crtitical loss/accuracy: 0.71819884 0.6408333333333334
training DNN with 100 training epochs and SGD batch size=200.
train loss/accuracy: 0.75898373 0.5176282051282052
test loss/accuracy: 0.7683781 0.5100961538461538
crtitical loss/accuracy: 0.7331881 0.5391666666666667
training DNN with 100 training epochs and SGD batch size=300.
train loss/accuracy: 0.8795474 0.5637820512820513
test loss/accuracy: 0.8970626 0.541826923076923
crtitical loss/accuracy: 0.7265906 0.6533333333333333
training DNN with 100 training epochs and SGD batch size=400.
train loss/accuracy: 1.0428126 0.4721153846153846
test loss/accuracy: 1.0210873 0.47115384615384615
crtitical loss/accuracy: 1.175693 0.36
# put text on matrix elements
for i, x_val in enumerate(np.arange(len(x))):
for j, y_val in enumerate(np.arange(len(y))):
c = "${0:.1f}\\%$".format( 100*data[j,i])
ax.text(x_val, y_val, c, va='center', ha='center')
# convert axis vaues to to string labels
x=[str(i) for i in x]
y=[str(i) for i in y]
ax.set_xticklabels(['']+x)
ax.set_yticklabels(['']+y)
ax.set_xlabel('$\\mathrm{batch\\ size}$',fontsize=fontsize)
ax.set_ylabel('$\\mathrm{training\\ epochs}$',fontsize=fontsize)
ax.set_title(title,fontsize=fontsize)
plt.tight_layout()
plt.show()
grid_search()
Problem 3 - SDSS galaxies
You should restart the kernel for Problem 2.
The data is provided in the file "specz_data.txt". The columns of the file (length of 13) correspond to -
spectroscopic redshift ('zspec'), RA, DEC, magnitudes in 5 bands - u, g, r, i, z (denoted as 'mu,' 'mg,' 'mr,' 'mi,'
'mz' respectively); Exponential and de Vaucouleurs model magnitude fits ('logExp' and 'logDev'
http://www.sdss.org/dr12/algorithms/magnitudes/) (http://www.sdss.org/dr12/algorithms/magnitudes/)); zebra
fit ('pz_zebra); Neural Network fit ('pz_NN') and its error estimate ('pz_NN_Err')
We will undertake 2 exercises -
Regression
We will use the magnitude of object in different bands ('mu, mg, mr, mi, mz') and do a regression
exercise to estimate the redshift of the object. Hence our feature space is 5.
The correct redshift is given by 'zspec', which is the spectroscopic redshift of the object. We will use
this for training and testing purpose.
Sidenote: Photometry vs. Spectroscopy
The amount of energy we receive from celestial objects – in the form of radiation – is called the flux,
and an astro- nomical technique of measuring the flux is photometry. Flux is usually measured over
broad wavelength bands, and with the estimate of the distance to an object, it can infer the object’s
luminosity, temperature, size, etc. Usually light is passed through colored filters, and we measure the
intensity of the filtered light.
On the other hand, spectroscopy deals with the spectrum of the emitted light. This tells us what the
object is made of, how it is moving, the pressure of the material in it, etc. Note that for faint objects
making photometric observation is much easier.
Photometric redshift (photoz) is an estimate of the distance to the object using photometry.
Spectroscopic redshift observes the object’s spectral lines and measures their shifts due to the
Doppler effect to infer the distance.
Classification
We will use the same magnitudes and now also the redshift of the object ('zspec') to classify the
object as either Elleptical or Spiral. Hence our feature space is now 6.
The correct class is given by compring 'logExp' and 'logDev' which are the fits for Exponential and
Devocular profiles. If logExp > logDev, its a spiral and vice-versa. We will use this for training and
testing purpose. Since the classes are not explicitly given, generate a column for those (Classes can
be ±1. If it is 0, it does not belong to either of the class.)
In [ ]:
Cleaning
Read in the files to create the data (X and Y) for both regression and classification.
You will have to clean the data -
Drop the entries that are nan or infinite
Drop the unrealistic numbers such as 999, -999; and magnitudes that are unrealistic. Since these are
absolute magnitudes, they should be positive and high. Lets choose a magnitude limit of 15 as safe bet.
For classification, drop the entries that do not belong to either of the class
In [4]:
dict_keys(['zspec', 'RA', 'DEC', 'mu', 'mg', 'mr', 'mi', 'mz', 'logExp
', 'logDev', 'pz_zebra', 'pz_NN', 'pz_NN_Err'])
#Read in and create data
fname = 'specz_data.txt'
spec_dat=np.genfromtxt(fname,names=True)
print(spec_dat.dtype.fields.keys())
#convenience variable
zspec = spec_dat['zspec']
pzNN = spec_dat['pz_NN']
#some N redshifts are not defined
pzNN[pzNN < 0] = np.nan
#For Regression
bands = ['u', 'g', 'r','i', 'z' ]
mlim = 15
xdata = np.concatenate([[spec_dat['m%s'%i] for i in bands]]).T
bad = (xdata[:, 0] < mlim) | (xdata[:, 1] < mlim) | (xdata[:, 2] < mlim) & (xdata[:,
xdata = xdata[~bad]
xdata[xdata<0] = 0
ydata = zspec[~bad]
#For classification
classes = np.sign(spec_dat['logExp'] - spec_dat['logDev'])
tmp = np.concatenate([[spec_dat['m%s'%i] for i in bands]]).T
xxdata = np.concatenate([tmp, zspec.reshape(-1, 1)], axis=1)
bad = (classes==0) | (xxdata[:, 0] < mlim) | (xxdata[:, 1] < mlim) | (xxdata[:, 2] <
xxdata = xxdata[~bad]
classes = classes[~bad]
For regression, X and Y data (called "xdata" and "ydata," respectively) is cleaned magnitudes (5 feature space)
and spectroscopic redshifts respectively. For classification, X and Y data (called "xxdata" and "classes"
respectively) is cleaned magnitudes+spectroscopic redshifts respectively (6 feature space) and classees
respectively.
In [5]:
Visualization
The next step should be to visualize the data.
For regression
Make a histogram for the distribution of the data (spectroscopic redshift).
Make 5 2D histograms of the distribution of the magnitude as function of redshift (Hint:
https://matplotlib.org/devdocs/api/_as_gen/matplotlib.axes.Axes.hist2d.html
(https://matplotlib.org/devdocs/api/_as_gen/matplotlib.axes.Axes.hist2d.html))
For classification
Make 6 1-d histogram for the distribution of the data (6 features - zspec and 5 magnitudes) for both class
1 and -1 separately
1. Make histograms for both regression and classification.
In [6]:
For Regression:
Before: Size of datasets is 5338
After: Size of datasets is 4535
For Classification:
Before: Size of datasets is 5338
After: Size of datasets is 4147
print('For Regression:')
print('Before: Size of datasets is ', zspec.shape[0])
print('After: Size of datasets is ', xdata.shape[0])
print('')
print('For Classification:')
print('Before: Size of datasets is ', zspec.shape[0])
print('After: Size of datasets is ', xxdata.shape[0])
plt.hist(ydata, bins = 50, rwidth = 0.8)
plt.xlabel('Redshift')
plt.ylabel('Amount')
plt.title('Distribution of the spectroscopic redshift')
plt.show()
plt.show()
plt.hist2d(ydata, xdata[:,0], bins = 100)
plt.xlabel('Redshift')
plt.ylabel('Mu')
plt.title('Distribution of the Mu-Redshift')
plt.show()
plt.hist2d(ydata, xdata[:,1], bins = 100)
plt.xlabel('Redshift')
plt.ylabel('Mg')
plt.title('Distribution of the Mg-Redshift')
plt.show()
plt.hist2d(ydata, xdata[:,2], bins = 100)
plt.xlabel('Redshift')
plt.ylabel('Mr')
plt.title('Distribution of the Mr-Redshift')
plt.show()
plt.hist2d(ydata, xdata[:,3], bins = 100)
plt.xlabel('Redshift')
plt.ylabel('Mi')
plt.title('Distribution of the Mi-Redshift')
plt.show()
plt.hist2d(ydata, xdata[:,4], bins = 100)
plt.xlabel('Redshift')
plt.ylabel('Mz')
plt.title('Distribution of the Mz-Redshift')
plt.show()
In [7]:
In [8]:
separate0 = []
separate1 = []
for i in range(np.size(classes)):
if classes[i] == -1:
separate0.append(xxdata[i])
elif classes[i] == 1:
separate1.append(xxdata[i])
separate0 = np.array(separate0)
separate1 = np.array(separate1)
fig, axes = plt.subplots(2,3,figsize = (15,10))
ax = axes[0,0]
ax.hist(separate0[:,0], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mu')
ax.set_ylabel('Amount')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mu')
ax = axes[0,1]
ax.hist(separate0[:,1], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mg')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mg')
ax = axes[0,2]
ax.hist(separate0[:,2], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mr')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mr')
ax = axes[1,0]
ax.hist(separate0[:,3], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mi')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mi')
ax = axes[1,1]
ax.hist(separate0[:,4], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mz')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mz')
ax = axes[1,2]
ax.hist(separate0[:,5], bins = 50, rwidth = 0.8)
ax.set_xlabel('Redshift')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the spectroscopic redshift')
plt.title('Distribution of the data for class -1')
plt.show()
In [9]:
fig, axes = plt.subplots(2,3,figsize = (15,10))
ax = axes[0,0]
ax.hist(separate1[:,0], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mu')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mu')
ax = axes[0,1]
ax.hist(separate1[:,1], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mg')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mg')
ax = axes[0,2]
ax.hist(separate1[:,2], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mr')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mr')
ax = axes[1,0]
ax.hist(separate1[:,3], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mi')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mi')
ax = axes[1,1]
ax.hist(separate1[:,4], bins = 50, rwidth = 0.8)
ax.set_xlabel('Mz')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the Mz')
ax = axes[1,2]
ax.hist(separate1[:,5], bins = 50, rwidth = 0.8)
ax.set_xlabel('Redshift')
ax.set_ylabel('Amount')
ax.set_title('Distribution of the spectroscopic redshift')
plt.title('Distribution of the data for class 1')
plt.show()
2. Do the following preprocessing:
Preprocessing:
Next, split the sample into training data and the testing data. We will be using the training data to train
different algorithms and then compare the performance over the testing data. In this project, keep 80%
data as training data and uses the remaining 20% data for testing.
Often, the data can be ordered in a specific manner, hence shuffle the data prior to splitting it into training
and testing samples.
Many algorithms are also not scale invariant, and hence scale the data (different features to a uniform
scale). All this comes under preprocessing the data. http://scikitlearn.org/stable/modules/preprocessing.html#preprocessing
(http://scikitlearn.org/stable/modules/preprocessing.html#preprocessing)
Use StandardScaler from sklearn (or write your own routine) to center the data to 0 mean and 1 variance.
Note that you only center the training data and then use its mean and variance to scale the testing data
before using it.
Hint: How to get a scaled training data:
1. Let the training data be: train = ("training X data", "training Y data")
2. You can first define a StandardScaler:
scale_xdata, scale_ydata = preprocessing.StandardScaler(), preprocessing.StandardScaler()
3. Then, do the fit:
for regression: scale_xdata.fit(train_regression[0]), scale_ydata.fit(train_regression[1].reshape(-1, 1))
for classication: scale_xdata.fit(train_classification[0])
Here, no need to fit for y data for classification (it's either +1 or -1. Already scaled)
4. Next, transform:
for regression: scaled_train_data = (scale_xdata.fit_transform(train_regression[0]),
scale_ydata.fit_transform(train_regression[1].reshape(-1, 1)))
for classication: scaled_train_data = (scale_xdata.fit_transform(train_classification[0]),
train_classification[1])
Again, y data is already scaled for classification.
Do this for test data as well.
In [10]:
from sklearn import preprocessing
In [11]:
Metrics
The last remaining preperatory step is to write metric for gauging the performance of the algorithm. Write a
function to calculate the 'RMS' error given (y_predict, y_truth) to gauge regression and another function to
evaluate accuracy of classification.
In addition, for classification, we will also use confusion matrix.
Below is an example you can use. Feel free to write you own.
In [72]:
Out[11]:
StandardScaler(copy=True, with_mean=True, with_std=True)
from sklearn.model_selection import train_test_split
X_train_regression, X_test_regression, Y_train_regression, Y_test_regression = train_test_split
X_train_classification, X_test_classification, Y_train_classification, Y_test_classification
scale_xdata, scale_ydata = preprocessing.StandardScaler(),preprocessing.StandardScaler
scale_xdata.fit(X_train_regression)
scale_ydata.fit(Y_train_regression.reshape(-1, 1))
from sklearn.metrics import confusion_matrix
def rms(x, y, scale1=None, scale2=None):
'''Calculate the rms error given the truth and the prediction
'''
mask = np.isfinite(x[:]) & np.isfinite(y[:])
if scale1 is not None:
x= scale1.inverse_transform(x)
if scale2 is not None:
y = scale2.inverse_transform(y)
return np.sqrt(np.mean((x[mask] - y[mask]) ** 2))
def acc(x, y):
'''Calculate the accuracy given the truth and the prediction
'''
mask = np.isfinite(x[:]) & np.isfinite(y[:])
return (x == y).sum()/x.size
Hyperparameter method
Now, we will be varying hyperparameters to get the best model and build some intuition. There are various
ways to do this and we will use Grid Search methodology (as you did in Problem 1 and 2) which simply tries all
the combinations along with some cross-validation scheme. For most part, we will use 4-fold cross validation.
Sklearn provides GridSearchCV functionality for this purpose.
Its recommended to spend some time to go through output format of GridSearchCV and write some utility
functions to make the recurring plots for every parameter.
Grid Search returns a dictionary with self explanatory keys for the most part. Mostly, the keys correspond to
(masked) numpy arrays of size = #(all possible combination of parameters). The value of individual parameter in
every combination is given in arrays with keys starting from 'param_*' and this should help you to match the
combination with the corresponding scores.
For masked arrays, you can access the data values by using *.data
Do not overwrite these grid search-ed variables (and not only their result) since we will compare all the models
together in the end
In [73]:
Method 1. k Nearest Neighbors
For regression, let us play with grid search using knn to tune hyperparmeters. (https://scikitlearn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html
(https://scikitlearn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html))
Consider the following 3
hyperparameters -
Number of neighbors ([2, 3, 5, 10, 15, 20, 25, 50, 100])
Weights of leaves (Uniform or Inverse Distance weighing)
Distance metric (Eucledian or Manhattan distance - parameter 'p')
1. Do a grid search on these parameters. List the combination of hyperparameters you tried and evaluate the
accuracy (mean test score) and its standard deviation. Which gives the highest accuracy value?
In [74]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
# http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
# http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html
from sklearn.neighbors import KNeighborsRegressor
Hint: (Read the documentations carefully for more detail.)
First, define the hyperparameters: parameters = {'n_neighbors':[2, 3, 5, 10, 15, 20, 25, 50, 100], 'weights':
['uniform', 'distance'], 'p':[1, 2]}
Specify the algorithm you want to use: e.g. knnr = KNeighborsRegressor()
Then, Do a grid search on these parameters using 4 fold cross validation: gcknn = GridSearchCV(knnr,
parameters, cv=4)
Do the fit: gcknn.fit(*scaled_training_data)
(Let "scaled_training_data" be the training data where "scaled_training_data = ("train X data", "train Y data")"
Get results:
has the following dictionaries: "rank_test_score," "mean_test_score," "std_test_score," and
"params" (See http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
(http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html))
Then, you can evaluate the models based on "rank_test_score" and print out their "params," along with their
"mean_test_score" and "std_test_score".
results = gcknn.cv_results_
cv_results_
In [ ]:
2. Also print out fitting and scoring times for all hyperparameter combinations.
Plot timings for fitting and scoring
Hint: Assume that you got results from:
Then, get the scoring time: results['mean_score_time']
and the fitting time: results['mean_fit_time']
results = gcknn.cv_results_
In [ ]:
...
...
3. Based on the results you obtained in Part 1 and 2, answer the following questions
Is it always better to use more neighbors?
Is it better to weigh the leaves, if yes, which distnace metric performs better?
GridCV returns fitting and scoring time for every combination. You will find that scoring time is higher than
training time. Why do you think is that the case?
Answer:
4. Which parameters seem to affect the performance most? To better answer this question, make plots of the
mean test score for each hyperparameter.
Hint: Suppose you have two types of hyperparameters: A and B. Let A = [1, 2] and B = [1, 2, 4, 7, 10].
Then, you have 20 different combination of hyperparameters.
Let A = 1. Then, you can try (A,B) = (1,1), (1,2), (1,4), (1,7), (1,10) Suppose that the mean score you got for the
above combination is [0.7, 0.72, 0.75, 0.77, 0.8]. Similarly, for A = 2, you tried (A,B) = (2,1), (2,2), (2,4), (2,7),
(2,10) and obtaind the mean score of [0.8, 0.82, 0.85, 0.87, 0.9].
To better see how changing the value of paramter A affects the performance, you can make the following plot:
In [75]:
This is the plot of the mean test score for A marginalizing over B.
Similarly, make a plot of the mean test score for each kNN hyperparameter.
In [ ]:
5. You have determined the best combination of hyperparameters and CV schemes. Predict the test y data
using the GridSearchCV method. Use the "rms" metric function we defined earlier and calculate the rms error
on the test data.
Hint: To determine the rms error, you need:
Truth: given from data (test_data[1])
Prediction: gridsearch.predict(test_data[0]) (https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
(https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html))
A_1 = [0.7, 0.72, 0.75, 0.77, 0.8]
A_2 = [0.8, 0.82, 0.85, 0.87, 0.9]
plt.plot(A_1, label = "A=1")
plt.plot(A_2, label = "A=2")
plt.ylabel("mean test score")
plt.legend()
plt.show()
...
In [ ]:
Classification
In [ ]:
Here we will look at 4 different type of cross-validation schemes -
Kfold
Stratified Kfold
Shuffle Split
Stratified Shuffle Split
6. Assuming the list of hyperparameters from Part 1, do 4 different grid searches. From Part 1, take top 5
combination of hyperparameters which gives you the highest accuracy value. Rank the performance of CV
schemes for each combination.
In [ ]:
...
from sklearn.neighbors import KNeighborsClassifier
# http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
from sklearn.model_selection import KFold, StratifiedKFold, ShuffleSplit, StratifiedShuffleSplit
In [ ]:
7. Answer the following questions:
Are the conclusions different for any parameter from the regression case?
Does the mean accuracy change for different CV scheme?
Does the standard deviation in mean accuracy change?
In [ ]:
Answer:
8. Using the best combination of hyperparameters and CV schemes you have found, compute the confusion
matrix (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html (https://scikitlearn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html))
and evaluate the accuracy.
Hint: To get a confusion matrix, you need both truth (available from data) and prediction (can be computed
using .predict function from GridSearchCV (https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)
(https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)).
parameters = {'n_neighbors':[2, 3, 5, 10, 15, 20, 25, 50, 100], 'weights':['uniform'
knnc = KNeighborsClassifier()
#Grid Search
gc = GridSearchCV(knnc, parameters, cv=KFold(4, random_state=100))
#Do the fit
...
gc2 = GridSearchCV(knnc, parameters, cv=StratifiedKFold(4, random_state = 100))
#Do the fit
...
gc3 = GridSearchCV(knnc, parameters, cv=ShuffleSplit(4, 0.1, random_state = 100))
#Do the fit
...
gc4 = GridSearchCV(knnc, parameters, cv=StratifiedShuffleSplit(4, 0.1, random_state
#Do the fit
...
...
In [ ]:
Method 2. Random Forests
The most important feature of the random forest is the number of trees in the ensemble. We will also play with
the maximum depth of the trees.
Try:
n_estimators = [10, 50, 150, 200, 300]
max_depth = [10, 50, 100]
In [ ]:
1. Do the grid search over n_estimators and max_depth. List the combination of hyperparameters you tried and
evaluate the accuracy (mean test score) and its standard deviation. Which gives the highest accuracy value?
In [ ]:
2. Which parameters seem to affect the performance most? To better answer this question, make plots of the
mean test score for each hyperparameter. (plot the mean test score of n_estimators marginalizing over
max_depth, etc)
In [ ]:
...
from sklearn.ensemble import RandomForestRegressor
# http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
rf = RandomForestRegressor()
parameters = ...
gcrf = GridSearchCV(rf, parameters, cv=5)
...
...
3. Based on the results you obtained in Part 1, answer the following questions:
Are the scores of these models statistically different? Based on this, which architecture will you choose for
your model?
For every parameter, make the plot for fitting time. Based on this and the previous question, how many
trees do you recommend keeping in the ensemble?
In [ ]:
Answer:
4. You have determined the best combination of hyperparameters. Predict the test y data using the
GridSearchCV method. Use the "rms" metric function we defined earlier and calculate the rms error on the test
data.
In [ ]:
Classification
In [ ]:
In [ ]:
...
...
from sklearn.ensemble import RandomForestClassifier
# http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
#Grid search (This will take few minutes)
rfc = RandomForestClassifier()
parameters = ...
gcrfc = GridSearchCV(rfc, parameters, cv=StratifiedShuffleSplit(4, 0.1, random_state
...
5. Assuming the list of hyperparameters from Part 1, do the grid search using StratifiedShuffleSplit CV scheme.
List the combination of hyperparameters you tried and evaluate the accuracy (mean test score) and its standard
deviation. Which gives the highest accuracy value?
In [ ]:
6. Using the best combination of hyperparameters, compute the confusion matrix (https://scikitlearn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
(https://scikitlearn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html))
and evaluate the accuracy.
In [ ]:
To Submit
Execute the following cell to submit. If you make changes, execute the cell again to resubmit the final copy of
the notebook, they do not get updated automatically.
We recommend that all the above cells should be executed (their output visible) in the notebook at the
time of submission.
Only the final submission before the deadline will be graded.
In [ ]:
...
...
_ = ok.submit()
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。