联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2018-12-13 10:16

Project 3

Classification and inference with machine learning

This notebook is arranged in cells. Texts are usually written in the markdown cells, and here you can use html

tags (make it bold, italic, colored, etc). You can double click on this cell to see the formatting.

The ellipsis (...) are provided where you are expected to write your solution but feel free to change the template

(not over much) in case this style is not to your taste.

Hit "Shift-Enter" on a code cell to evaluate it. Double click a Markdown cell to edit.

Link Okpy

In [1]:

Imports

=====================================================================

Assignment: Project 3

OK, version v1.12.5

=====================================================================

Open the following URL:

https://okpy.org/client/login/ (https://okpy.org/client/login/)

After logging in, copy the code from the web page and paste it into th

e box.

Then press the "Enter" key on your keyboard.

Paste your code here: KozofyhR7YkKXUwCK3ycaIJ6Nubck9

Successfully logged in as ljma@berkeley.edu

from client.api.notebook import Notebook

ok = Notebook('Project3_U.ok')

_ = ok.auth(inline = True)

In [2]:

Problem 1 - Using Keras - MNIST

The goal of this notebook is to introduce deep neural networks (DNNs) and convolutional neural networks

(CNNs) using the high-level Keras package and to become familiar with how to choose its architecture, cost

function, and optimizer in Keras. We will also learn how to train neural networks.

We will once again work with the MNIST dataset of hand written digits introduced in HW8. The goal is to find a

statistical model which recognizes and distinguishes between the ten handwritten digits (0-9).

The MNIST dataset comprises handwritten digits, each of which comes in a square image, divided into a

pixel grid. Every pixel can take on nuances of the gray color, interpolating between white and

black, and hence each data point assumes any value in the set . Since there are categories

in the problem, corresponding to the ten digits, this problem represents a generic classification task.

In this Notebook, we show how to use the Keras python package to tackle the MNIST problem with the help of

deep neural networks.

28 × 28 256

{0, 1, … , 255} 10

import numpy as np

from scipy.integrate import quad

#For plotting

import matplotlib.pyplot as plt

%matplotlib inline

import warnings

warnings.filterwarnings('ignore')

Creating DNNs with Keras

Constructing a Deep Neural Network to solve ML problems is a multiple-stage process. Quite generally, one

can identify the key steps as follows:

step 1: Load and process the data

step 2: Define the model and its architecture

step 3: Choose the optimizer and the cost function

step 4: Train the model

step 5: Evaluate the model performance on the unseen test data

step 6: Modify the hyperparameters to optimize performance for the specific data set

We would like to emphasize that, while it is always possible to view steps 1-5 as independent of the particular

task we are trying to solve, it is only when they are put together in step 6 that the real gain of using Deep

Learning is revealed, compared to less sophisticated methods such as the regression models. With this remark

in mind, we shall focus predominantly on steps 1-5 below. We show how one can use grid search methods to

find optimal hyperparameters in step 6.

Step 1: Load and Process the Data

Keras knows to download automatically the MNIST data from the web. All we need to do is import the mnist

module and use the load_data() class, and it will create the training and test data sets or us.

The MNIST set has pre-defined test and training sets, in order to facilitate the comparison of the performance

of different models on the data.

Once we have loaded the data, we need to format it in the correct shape ( ).

The size of each sample, i.e. the number of bare features used is N_features (whis is 784 because we have a

pixel grid), while the number of potential classification categories is "num_classes" (which is 10,

number of digits).

Each pixel contains a greyscale value quantified by an integer between 0 and 255. To standardize the dataset,

we normalize the input data in the interval [0, 1].

(N , ) samples Nfeatures

28 × 28

In [3]:

1. Make a plot of one MNIST digit (2D plot using X data - make sure to reshape it into a matrix) and

label it (which digit does it correspond to?).

28 × 28

Using TensorFlow backend.

from __future__ import print_function

import keras,sklearn

# suppress tensorflow compilation warnings

import os

import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

seed=0

np.random.seed(seed) # fix random seed

tf.set_random_seed(seed)

from keras.datasets import mnist

# input image dimensions

num_classes = 10 # 10 digits

img_rows, img_cols = 28, 28 # number of pixels

# the data, shuffled and split between train and test sets

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

X_train = X_train[:40000]

Y_train = Y_train[:40000]

# reshape data, depending on Keras backend

X_train = X_train.reshape(X_train.shape[0], img_rows*img_cols)

X_test = X_test.reshape(X_test.shape[0], img_rows*img_cols)


# cast floats to single precesion

X_train = X_train.astype('float32')

X_test = X_test.astype('float32')

# rescale data in interval [0,1]

X_train /= 255

X_test /= 255

In [4]:

Last, we cast the label vectors y to binary class matrices (a.k.a. one-hot format).

plt.imshow(X_train[0].reshape((28,28)), cmap = plt.cm.gray)

plt.title('Label = %d' %Y_train[0])

plt.show()

In [23]:

Here in this template, we use 40000 training samples and 10000 test samples. Remember that we

preprocessed data into the shape (N , ). samples Nfeatures

In [24]:

before conversion -

y vector : [5 0 4 1 9 2 1 3 1 4]

after conversion -

y vector : [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]

[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]

[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]

[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]

[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]

[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]

X_train shape: (40000, 784)

Y_train shape: (40000, 10)

40000 train samples

10000 test samples

# convert class vectors to binary class matrices

print("before conversion - ")

print("y vector : ", Y_train[0:10])

Y_train = keras.utils.to_categorical(Y_train, num_classes)

Y_test = keras.utils.to_categorical(Y_test, num_classes)

print("after conversion - ")

print("y vector : ", Y_train[0:10])

print('X_train shape:', X_train.shape)

print('Y_train shape:', Y_train.shape)

print()

print(X_train.shape[0], 'train samples')

print(X_test.shape[0], 'test samples')

Step 2: Define the Neural Net and its Architecture

We can now move on to construct our deep neural net. We shall use Keras's Sequential() class to

instantiate a model, and will add different deep layers one by one.

Let us create an instance of Keras' Sequential() class, called model . As the name suggests, this class

allows us to build DNNs layer by layer. (https://keras.io/getting-started/sequential-model-guide/

(https://keras.io/getting-started/sequential-model-guide/))

In [4]:

We use the add() method to attach layers to our model. For the purposes of our introductory example, it

suffices to focus on Dense layers for simplicity. (https://keras.io/layers/core/ (https://keras.io/layers/core/))

Every Dense() layer accepts as its first required argument an integer which specifies the number of neurons.

The type of activation function for the layer is defined using the activation optional argument, the input of

which is the name of the activation function in string format. Examples include relu , tanh , elu ,

sigmoid , softmax .

In order for our DNN to work properly, we have to make sure that the numbers of input and output neurons for

each layer match. Therefore, we specify the shape of the input in the first layer of the model explicitly using the

optional argument input_shape=(N_features,) . The sequential construction of the model then allows

Keras to infer the correct input/output dimensions of all hidden layers automatically. Hence, we only need to

specify the size of the softmax output layer to match the number of categories.

First, add a Dense layer with 400 output neurons and relu activation function.

In [26]:

Add another layer with 100 output neurons. Then, we will apply "dropout," a regularization scheme that has

been widely adopted in the neural networks literature: during the training procedure neurons are randomly

“dropped out” of the neural network with some probability giving rise to a thinned network. It prevents

overfitting by reducing spurious correlations between neurons within the network by introducing a

randomization procedure.

p

from keras.models import Sequential

from keras.layers import Dense, Dropout, Flatten

from keras.layers import Conv2D, MaxPooling2D

# instantiate model

model = Sequential()

model.add(Dense(400,input_shape=(img_rows*img_cols,), activation='relu'))

In [27]:

Lastly, we need to add a soft-max layer since we have a multi-class output.

In [28]:

Step 3: Choose the Optimizer and the Cost Function

Next, we choose the loss function according to which to train the DNN. For classification problems, this is the

cross entropy, and since the output data was cast in categorical form, we choose the

categorical_crossentropy defined in Keras' losses module. Depending on the problem of interest

one can pick any other suitable loss function. To optimize the weights of the net, we choose SGD. This

algorithm is already available to use under Keras' optimizers module (https://keras.io/optimizers/)

(https://keras.io/optimizers/)), but we could use Adam() or any other built-in one as well. The parameters for

the optimizer, such as lr (learning rate) or momentum are passed using the corresponding optional

arguments of the SGD() function.

While the loss function and the optimizer are essential for the training procedure, to test the performance of the

model one may want to look at a particular metric of performance. For instance, in categorical tasks one

typically looks at their accuracy , which is defined as the percentage of correctly classified data points.

To complete the definition of our model, we use the compile() method, with optional arguments for the

optimizer , loss , and the validation metric as follows:

In [29]:

model.add(Dense(100, activation='relu'))

# apply dropout with rate 0.5

model.add(Dropout(0.5))

model.add(Dense(num_classes, activation='softmax'))

# compile the model

model.compile(loss=keras.losses.categorical_crossentropy, optimizer='SGD', metrics=[

Step 4: Train the model

We train our DNN in minibatches. Shuffling the training data during training improves stability of the model.

Thus, we train over a number of training epochs.

(The number of epochs is the number of complete passes through the training dataset, and the batch size is a

number of samples propagated through the network before the model is updated.)

Training the DNN is a one-liner using the fit() method of the Sequential class. The first two required

arguments are the training input and output data. As optional arguments, we specify the mini- batch_size ,

the number of training epochs , and the test or validation data. To monitor the training procedure for every

epoch, we set verbose=True .

Let us set batch_size = 64 and epochs = 10.

In [30]:

Step 5: Evaluate the Model Performance on the Unseen Test Data

Next, we evaluate the model and read of the loss on the test data, and its accuracy using the evaluate()

method.

Train on 40000 samples, validate on 10000 samples

Epoch 1/10

40000/40000 [==============================] - 4s - loss: 1.2012 - acc

: 0.6446 - val_loss: 0.5087 - val_acc: 0.8839

Epoch 2/10

40000/40000 [==============================] - 4s - loss: 0.5895 - acc

: 0.8318 - val_loss: 0.3646 - val_acc: 0.9065

Epoch 3/10

40000/40000 [==============================] - 4s - loss: 0.4755 - acc

: 0.8646 - val_loss: 0.3081 - val_acc: 0.9193

Epoch 4/10

40000/40000 [==============================] - 3s - loss: 0.4100 - acc

: 0.8814 - val_loss: 0.2755 - val_acc: 0.9243

Epoch 5/10

40000/40000 [==============================] - 4s - loss: 0.3716 - acc

: 0.8975 - val_loss: 0.2527 - val_acc: 0.9288

Epoch 6/10

40000/40000 [==============================] - 4s - loss: 0.3445 - acc

: 0.9030 - val_loss: 0.2338 - val_acc: 0.9342

Epoch 7/10

40000/40000 [==============================] - 4s - loss: 0.3185 - acc

: 0.9105 - val_loss: 0.2203 - val_acc: 0.9383

Epoch 8/10

40000/40000 [==============================] - 4s - loss: 0.2991 - acc

: 0.9171 - val_loss: 0.2060 - val_acc: 0.9413

Epoch 9/10

40000/40000 [==============================] - 3s - loss: 0.2815 - acc

: 0.9213 - val_loss: 0.1972 - val_acc: 0.9437

Epoch 10/10

40000/40000 [==============================] - 4s - loss: 0.2656 - acc

: 0.9265 - val_loss: 0.1874 - val_acc: 0.9457

# training parameters

batch_size = 64

epochs = 10

# train DNN and store training info in history

history=model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs,

verbose=1, validation_data=(X_test, Y_test))

In [32]:

9632/10000 [===========================>..] - ETA: 0sTest loss: 0.187

36902387440205

Test accuracy: 0.9457

# evaluate model

score = model.evaluate(X_test, Y_test, verbose=1)

# print performance

print('Test loss:', score[0])

print('Test accuracy:', score[1])

# look into training history

# summarize history for accuracy

plt.plot(history.history['acc'])

plt.plot(history.history['val_acc'])

plt.ylabel('model accuracy')

plt.xlabel('epoch')

plt.legend(['train', 'test'], loc='best')

plt.show()

# summarize history for loss

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylabel('model loss')

plt.xlabel('epoch')

plt.legend(['train', 'test'], loc='best')

plt.show()

Step 6: Modify the Hyperparameters to Optimize Performance of the Model

Last, we show how to use the grid search option of scikit-learn (https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

(https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html))

to optimize the

hyperparameters of our model.

First, define a function for crating a DNN:

In [9]:

With epochs = 1 and batch_size = 64, do grid search over the following optimization schemes: ['SGD',

'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam'].

In [34]:

def create_DNN(optimizer=keras.optimizers.Adam()):

model = Sequential()

model.add(Dense(400,input_shape=(img_rows*img_cols,), activation='relu'))

model.add(Dense(100, activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,

optimizer=optimizer,

metrics=['accuracy'])

return model

from sklearn.model_selection import GridSearchCV

from keras.wrappers.scikit_learn import KerasClassifier

batch_size = 64

epochs = 1

model_gridsearch = KerasClassifier(build_fn=create_DNN,

epochs=epochs, batch_size=batch_size, verbose=1)

# list of allowed optional arguments for the optimizer, see `compile_model()`

optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']

Epoch 1/1

30000/30000 [==============================] - 3s - loss: 1.3620 - acc

: 0.5839

28928/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 1.4106 - acc

: 0.5763

28800/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 1.3413 - acc

: 0.6017

29056/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 1.3520 - acc

: 0.5856

29632/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4102 - acc

: 0.8784

28736/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4258 - acc

: 0.8723

28608/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4126 - acc

: 0.8776

29312/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4097 - acc

: 0.8779

29824/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.3794 - acc

: 0.8887

29824/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4031 - acc

: 0.8816

29184/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.3895 - acc

: 0.8866

29824/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.3752 - acc

: 0.8911

29632/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.5778 - acc

: 0.8325

29184/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.5900 - acc

: 0.8278

28864/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.5922 - acc

: 0.8268

optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']

# define parameter dictionary

param_grid = dict(optimizer=optimizer)

# call scikit grid search module

grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=

grid_result = grid.fit(X_train,Y_train)

Show the mean test score of all optimization schemes and determine which scheme gives the best accuracy.

29184/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.5868 - acc

: 0.8266

29952/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4324 - acc

: 0.8721

28864/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4337 - acc

: 0.8714

29440/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4356 - acc

: 0.8717

29568/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4332 - acc

: 0.8727

28608/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4958 - acc

: 0.8500

29312/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4888 - acc

: 0.8595

28928/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4658 - acc

: 0.8621

28736/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4754 - acc

: 0.8601

30000/30000 [==============================] - 1s

Epoch 1/1

30000/30000 [==============================] - 4s - loss: 0.3584 - acc

: 0.8929

28992/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.3585 - acc

: 0.8944

29248/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.3631 - acc

: 0.8917

29824/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.3624 - acc

: 0.8931

28800/30000 [===========================>..] - ETA: 0sEpoch 1/1

40000/40000 [==============================] - 6s - loss: 0.3168 - acc

: 0.9073

In [35]:

2. Create a DNN with one Dense layer having 200 output neurons. Do the grid search over any 5 different

activation functions from https://keras.io/activations/ (https://keras.io/activations/). Let epochs = 1, batches =

64, p_dropout=0.5, and optimizer=keras.optimizers.Adam(). Make sure to print the mean test score of each

case and determine which activation functions gives the best accuracy.

Doing the grid search requires quite a bit of memory. Please restart the kernel ("Kernel"-"Restart") and re-load

the data before doing a new grid search.

In [10]:

Best: 0.951650 using {'optimizer': 'Nadam'}

0.850700 (0.013746) with: {'optimizer': 'SGD'}

0.947125 (0.001248) with: {'optimizer': 'RMSprop'}

0.946550 (0.003741) with: {'optimizer': 'Adagrad'}

0.925900 (0.002684) with: {'optimizer': 'Adadelta'}

0.947200 (0.001200) with: {'optimizer': 'Adam'}

0.934825 (0.002807) with: {'optimizer': 'Adamax'}

0.951650 (0.000865) with: {'optimizer': 'Nadam'}

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

model = Sequential()

def create_DNN(activation):

model = Sequential()

model.add(Dense(200,input_shape=(img_rows*img_cols,), activation=activation))

model.add(Dropout(0.5))

model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,

optimizer='Adam',

metrics=['accuracy'])

return model

from sklearn.model_selection import GridSearchCV

from keras.wrappers.scikit_learn import KerasClassifier

batch_size = 64

epochs = 1

model_gridsearch = KerasClassifier(build_fn=create_DNN,

epochs=epochs, batch_size=batch_size, verbose=1)

Epoch 1/1

30000/30000 [==============================] - 3s - loss: 0.5012 - acc

: 0.8506

29120/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4967 - acc

: 0.8492

29184/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.4963 - acc

: 0.8530

29376/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.4939 - acc

: 0.8559

29440/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.4717 - acc

: 0.8611

29120/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.4722 - acc

: 0.8604

30000/30000 [==============================] - 0s

Epoch 1/1

30000/30000 [==============================] - 2s - loss: 0.4762 - acc

: 0.8587

29760/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.4679 - acc

: 0.8637

29376/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.4831 - acc

: 0.8567

29952/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.4629 - acc

: 0.8612

28864/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4722 - acc

: 0.8579

28992/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4746 - acc

: 0.8580

29248/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.7890 - acc

: 0.7656

28480/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.7785 - acc

: 0.7720

28992/30000 [===========================>..] - ETA: 0sEpoch 1/1

# list of allowed optional arguments for the optimizer, see `compile_model()`

activation = ['relu', 'tanh', 'elu', 'sigmoid', 'softmax']

# define parameter dictionary

param_grid = dict(activation=activation)

# call scikit grid search module

grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=

grid_result = grid.fit(X_train,Y_train)

In [11]:

30000/30000 [==============================] - 3s - loss: 0.7647 - acc

: 0.7724

29312/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.7746 - acc

: 0.7727

29184/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 1.9335 - acc

: 0.5028

29760/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 1.9408 - acc

: 0.5048

28928/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 1.9342 - acc

: 0.4760

29824/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 1.9398 - acc

: 0.4810

10000/10000 [==============================] - 0s

28928/30000 [===========================>..] - ETA: 0sEpoch 1/1

40000/40000 [==============================] - 4s - loss: 0.4484 - acc

: 0.8681

Best: 0.932725 using {'activation': 'relu'}

0.932725 (0.002930) with: {'activation': 'relu'}

0.912850 (0.003827) with: {'activation': 'tanh'}

0.913725 (0.005063) with: {'activation': 'elu'}

0.895375 (0.005523) with: {'activation': 'sigmoid'}

0.839475 (0.016595) with: {'activation': 'softmax'}

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

3. Now, do the grid search over different combination of batch sizes (10, 30, 50, 100) and number of epochs (1,

2, 5). Make sure to print the mean test score of each case and determine which activation functions gives the

best accuracy. Here, you have a freedom to create your own DNN (assume an arbitrary number of Dense layers,

optimization scheme, etc).

Doing the grid search requires quite a bit of memory. Please restart the kernel ("Kernel"-"Restart") and re-load

the data before doing a new grid search.

Hint: To do the grid search over both batch_size and epochs, you can do:

param_grid = dict(batch_size=batch_size, epochs=epochs)

In [13]:

Epoch 1/1

30000/30000 [==============================] - 14s - loss: 0.3851 - ac

c: 0.8832

29830/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 15s - loss: 0.3897 - ac

c: 0.8836

29960/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 15s - loss: 0.3945 - ac

c: 0.8820

29750/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 17s - loss: 0.3827 - ac

c: 0.8821

29980/30000 [============================>.] - ETA: 0sEpoch 1/2

30000/30000 [==============================] - 16s - loss: 0.3935 - ac

c: 0.8827

Epoch 2/2

30000/30000 [==============================] - 15s - loss: 0.2171 - ac

c: 0.9341

29930/30000 [============================>.] - ETA: 0sEpoch 1/2

model = Sequential()

def create_DNN():

model = Sequential()

model.add(Dense(200,input_shape=(img_rows*img_cols,), activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,

optimizer='Adam',

metrics=['accuracy'])

return model

from sklearn.model_selection import GridSearchCV

from keras.wrappers.scikit_learn import KerasClassifier

batch_size = 64

epochs = 1

model_gridsearch = KerasClassifier(build_fn=create_DNN,

epochs=epochs, batch_size=batch_size, verbose=1)

# list of allowed optional arguments for the optimizer, see `compile_model()`

batch_size = [10,30,50,100]

epochs = [1,2,5]

# define parameter dictionary

param_grid = dict(batch_size=batch_size, epochs=epochs)

# call scikit grid search module

grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=

grid_result = grid.fit(X_train,Y_train)

In [14]:

4. Do the grid search over the number of neurons in the Dense layer and make a plot of mean test score as a

function of num_neurons. Again, you have a freedom to create your own DNN.

Doing the grid search requires quite a bit of memory. Please restart the kernel ("Kernel"-"Restart") and re-load

the data before doing a new grid search.

In [8]:

Best: 0.967475 using {'batch_size': 10, 'epochs': 5}

0.944350 (0.001052) with: {'batch_size': 10, 'epochs': 1}

0.956850 (0.002544) with: {'batch_size': 10, 'epochs': 2}

0.967475 (0.000476) with: {'batch_size': 10, 'epochs': 5}

0.939225 (0.003904) with: {'batch_size': 30, 'epochs': 1}

0.951700 (0.002840) with: {'batch_size': 30, 'epochs': 2}

0.967075 (0.000536) with: {'batch_size': 30, 'epochs': 5}

0.933700 (0.002487) with: {'batch_size': 50, 'epochs': 1}

0.949700 (0.001885) with: {'batch_size': 50, 'epochs': 2}

0.965475 (0.002815) with: {'batch_size': 50, 'epochs': 5}

0.927900 (0.002847) with: {'batch_size': 100, 'epochs': 1}

0.942250 (0.001410) with: {'batch_size': 100, 'epochs': 2}

0.960925 (0.002025) with: {'batch_size': 100, 'epochs': 5}

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

model = Sequential()

def create_DNN(number):

model = Sequential()

model.add(Dense(number,input_shape=(img_rows*img_cols,), activation='relu'))

model.add(Dense(100, activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,

optimizer='Adam',

metrics=['accuracy'])

return model

from sklearn.model_selection import GridSearchCV

from keras.wrappers.scikit_learn import KerasClassifier

batch_size = 64

epochs = 1

Epoch 1/1

30000/30000 [==============================] - 2s - loss: 0.5748 - acc

: 0.8241

29696/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.5448 - acc

: 0.8359

28672/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.5499 - acc

: 0.8358

28864/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 2s - loss: 0.5480 - acc

: 0.8343

30000/30000 [==============================] - 0s

Epoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4889 - acc

: 0.8569

29376/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4796 - acc

: 0.8577

28544/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4856 - acc

: 0.8552

28928/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4777 - acc

: 0.8576

29696/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4517 - acc

: 0.8657

28928/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4479 - acc

: 0.8656

30000/30000 [==============================] - 1s

Epoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4639 - acc

: 0.8632

29824/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 3s - loss: 0.4553 - acc

: 0.8660

29376/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4427 - acc

epochs = 1

model_gridsearch = KerasClassifier(build_fn=create_DNN,

epochs=epochs, batch_size=batch_size, verbose=1)

# list of allowed optional arguments for the optimizer, see `compile_model()`

number = [100, 200, 300, 400, 500, 600, 700, 800]

# define parameter dictionary

param_grid = dict(number=number)

# call scikit grid search module

grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=

grid_result = grid.fit(X_train,Y_train)

: 0.8665

28928/30000 [===========================>..] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4247 - acc

: 0.8745

29504/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4323 - acc

: 0.8713

29632/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 4s - loss: 0.4199 - acc

: 0.8742

29184/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 5s - loss: 0.4214 - acc

: 0.8740

29952/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 5s - loss: 0.4254 - acc

: 0.8748

29376/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 5s - loss: 0.4274 - acc

: 0.8746

30000/30000 [==============================] - 1s

Epoch 1/1

30000/30000 [==============================] - 5s - loss: 0.4251 - acc

: 0.8745

29760/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 6s - loss: 0.4095 - acc

: 0.8773

29824/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 6s - loss: 0.3991 - acc

: 0.8820

29504/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 6s - loss: 0.4172 - acc

: 0.8774

29504/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 6s - loss: 0.4014 - acc

: 0.8802

29120/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 7s - loss: 0.4092 - acc

: 0.8774

29504/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 7s - loss: 0.3995 - acc

: 0.8804

29248/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 7s - loss: 0.3929 - acc

: 0.8843

29632/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 7s - loss: 0.3926 - acc

: 0.8868

29632/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 7s - loss: 0.4041 - acc

: 0.8798

29440/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 7s - loss: 0.4018 - acc

: 0.8804

In [14]:

29824/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 7s - loss: 0.3985 - acc

: 0.8860

29376/30000 [============================>.] - ETA: 0sEpoch 1/1

30000/30000 [==============================] - 7s - loss: 0.3870 - acc

: 0.8854

29248/30000 [============================>.] - ETA: 0sEpoch 1/1

40000/40000 [==============================] - 10s - loss: 0.3510 - ac

c: 0.8956

Best: 0.950675 using {'number': 700}

0.933625 (0.003532) with: {'number': 100}

0.938175 (0.001894) with: {'number': 200}

0.942675 (0.003076) with: {'number': 300}

0.945950 (0.002427) with: {'number': 400}

0.950275 (0.003060) with: {'number': 500}

0.948750 (0.002145) with: {'number': 600}

0.950675 (0.002833) with: {'number': 700}

0.949900 (0.001416) with: {'number': 800}

# summarize results

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

xx = number

yy = []

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) with: %r" % (mean, stdev, param))

yy.append(mean)

In [15]:

Creating CNNs with Keras

Please restart the kernel ("Kernel"-"Restart") and re-load the data.

We have so far considered each MNIST data sample as a -long 1d vector. This approach neglects

any spatial structure in the image. On the other hand, we do know that in every one of the hand-written digits

there are local spatial correlations between the pixels, which we would like to take advantage of to improve the

accuracy of our classification model. To this end, we first need to reshape the training and test input data as

follows

(28 × 28, )

plt.plot(xx,yy)

plt.xlabel('The nubmer of neurons')

plt.ylabel('Mean test score')

plt.show()

In [22]:

One can ask the question of whether a neural net can learn to recognize such local patterns. This can be

achieved by using convolutional layers. Luckily, all we need to do is change the architecture of our DNN.

After we instantiate the model, add the first convolutional layer with 10 filters, which is the dimensionality of

output space. (https://keras.io/layers/convolutional/ (https://keras.io/layers/convolutional/)) Here, we will be

concerned with local spatial filters that take as inputs a small spatial patch of the previous layer at all depths.

We consider a three-dimensional kernel of size . Check out this visualization of the convolution

procedure for a square input of unit depth:

https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md

(https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) The convolution consists of running

this filter over all locations in the spatial plane. After computing the filter, the output is passed through a nonlinearity,

a ReLU.

5 × 5 × 1

X_train shape: (40000, 28, 28, 1)

Y_train shape: (40000,)

40000 train samples

10000 test samples

# reshape data, depending on Keras backend

if keras.backend.image_data_format() == 'channels_first':

X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)

X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)

input_shape = (1, img_rows, img_cols)

else:

X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)

X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)

input_shape = (img_rows, img_cols, 1)


print('X_train shape:', X_train.shape)

print('Y_train shape:', Y_train.shape)

print()

print(X_train.shape[0], 'train samples')

print(X_test.shape[0], 'test samples')

In [23]:

Subsequently, add a 2D pooling layer. (https://keras.io/layers/pooling/ (https://keras.io/layers/pooling/)) This

pooling layer coarse-grain spatial information by performing a subsampling at each depth. Here, we use the

the max pool operation. In a max pool, the spatial dimensions are coarse-grained by replacing a small region

(say 2 × 2 neurons) by a single neuron whose output is the maximum value of the output in the region.

In [24]:

Add another convolutional layers with 20 filters and apply dropout. Then, add another pooling layer and flatten

the data. You can do DNNs afterwards and compile the model.

from keras.models import Sequential

from keras.layers import Dense, Dropout, Flatten

from keras.layers import Conv2D, MaxPooling2D

model = Sequential()

model.add(Conv2D(10, kernel_size=(5, 5),

activation='relu',

input_shape=input_shape))

model.add(MaxPooling2D(pool_size=(2, 2)))

In [25]:

Lastly, train your CNN and evaluate the model.

In [27]:

----------------------------------------------------------------------

# add second convolutional layer with 20 filters

model.add(Conv2D(20, (5, 5), activation='relu'))

# apply dropout with rate 0.5

model.add(Dropout(0.5))

# add 2D pooling layer

model.add(MaxPooling2D(pool_size=(2, 2)))

# flatten data

model.add(Flatten())

# add a dense all-to-all relu layer

model.add(Dense(20*4*4, activation='relu'))

# apply dropout with rate 0.5

model.add(Dropout(0.5))

# soft-max layer

model.add(Dense(num_classes, activation='softmax'))

# compile the model

model.compile(loss=keras.losses.categorical_crossentropy,

optimizer='Adam',

metrics=['accuracy'])

# training parameters

batch_size = 64

epochs = 10

# train CNN

model.fit(X_train, Y_train,

batch_size=batch_size,

epochs=epochs,

verbose=1,

validation_data=(X_test, Y_test))

# evaliate model

score = model_CNN.evaluate(X_test, Y_test, verbose=1)

# print performance

print()

print('Test loss:', score[0])

print('Test accuracy:', score[1])

-----

ValueError Traceback (most recent call

last)

<ipython-input-27-4c64ca8f5efa> in <module>

9 epochs=epochs,

10 verbose=1,

---> 11 validation_data=(X_test, Y_test))

12

13 # evaliate model

/srv/app/venv/lib/python3.6/site-packages/keras/models.py in fit(self,

x, y, batch_size, epochs, verbose, callbacks, validation_split, valida

tion_data, shuffle, class_weight, sample_weight, initial_epoch, **kwar

gs)

865 class_weight=class_weight,

866 sample_weight=sample_weight,

--> 867 initial_epoch=initial_epoch)

868

869 def evaluate(self, x, y, batch_size=32, verbose=1,

/srv/app/venv/lib/python3.6/site-packages/keras/engine/training.py in

fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_spl

it, validation_data, shuffle, class_weight, sample_weight, initial_epo

ch, steps_per_epoch, validation_steps, **kwargs)

1520 class_weight=class_weight,

1521 check_batch_axis=False,

-> 1522 batch_size=batch_size)

1523 # Prepare validation data.

1524 do_validation = False

/srv/app/venv/lib/python3.6/site-packages/keras/engine/training.py in

_standardize_user_data(self, x, y, sample_weight, class_weight, check_

batch_axis, batch_size)

1380 output_shapes,

1381 check_batch_axis=False,

-> 1382 exception_prefix='target')

1383 sample_weights = _standardize_sample_weights(sample_we

ight,

1384

self._feed_output_names)

/srv/app/venv/lib/python3.6/site-packages/keras/engine/training.py in

_standardize_input_data(data, names, shapes, check_batch_axis, excepti

on_prefix)

142 ' to have shape ' + str(shapes[i])

+

143 ' but got array with shape ' +

--> 144 str(array.shape))

145 return arrays

146

ValueError: Error when checking target: expected dense_6 to have shape

(None, 10) but got array with shape (40000, 1)

5. Do the grid search over any 3 different optimization schemes and 2 activation functions. Suppose that we

have a 2 convolutional layers with 10 neurons. Let p_dropout = 0.5, epochs = 1, and batch_size = 64.

Determine which combination of optimization scheme, activation function, and number of neurons gives the

best accuracy.

Doing the grid search requires quite a bit of memory. Please restart the kernel ("Kernel"-"Restart") and re-load

the data before doing a new grid search.

In [ ]:

6. Create an arbitrary DNN (you are free to choose any activation function, optimization scheme, etc) and

evaluate its performance. Then, add two convolutional layers and pooling layers and evaluate its performance

again. How do they compare?

In [ ]:

Problem 2 - Using Tensorflow - Ising Model

You should restart the kernel for Problem 2.

Next, we show how one can use deep neural nets to classify the states of the 2D Ising model according to their

phase. This should be compared with the use of logistic-regression in HW8.

The Hamiltonian for the classical Ising model is given by

where the lattice site indices run over all nearest neighbors of a 2D square lattice, and is some arbitrary

interaction energy scale. We adopt periodic boundary conditions. Onsager proved that this model undergoes a

phase transition in the thermodynamic limit from an ordered ferromagnet with all spins aligned to a disordered

phase at the critical temperature . For any finite system size, this critical point

is expanded to a critical region around .

H = J ∑ , ∈ {±1}

ij

SiSj Sj

i, j J

Tc /J = 2/ log(1 + 2) ≈ 2.26 √

Tc

...

...

Step 1: Load and Process the Data

We begin by writing a DataSet class and two functions read_data_sets and load_data to process

the 2D Ising data.

The DataSet class performs checks on the data shape and casts the data into the correct data type for the

calculation. It contains a function method called next_batch which shuffles the data and returns a minibatch

of a pre-defined size. This structure is particularly useful for the training procedure in TensorFlow.

In [5]:

# -*- coding: utf-8 -*-

from __future__ import absolute_import, division, print_function

import numpy as np

seed=12

np.random.seed(seed)

import sys, os, argparse

import tensorflow as tf

from tensorflow.python.framework import dtypes

# suppress tflow compilation warnings

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

tf.set_random_seed(seed)

In [6]:

class DataSet(object):

def __init__(self,data_X,data_Y,dtype=dtypes.float32):

"""Checks data and casts it into correct data type. """

dtype = dtypes.as_dtype(dtype).base_dtype

if dtype not in (dtypes.uint8, dtypes.float32):

raise TypeError('Invalid dtype %r, expected uint8 or float32' % dtype)

assert data_X.shape[0] == data_Y.shape[0], ('data_X.shape: %s data_Y.shape: %s'

self.num_examples = data_X.shape[0]

if dtype == dtypes.float32:

data_X = data_X.astype(np.float32)

self.data_X = data_X

self.data_Y = data_Y

self.epochs_completed = 0

self.index_in_epoch = 0

def next_batch(self, batch_size, seed=None):

"""Return the next `batch_size` examples from this data set."""

if seed:

np.random.seed(seed)

start = self.index_in_epoch

self.index_in_epoch += batch_size

if self.index_in_epoch > self.num_examples:

# Finished epoch

self.epochs_completed += 1

# Shuffle the data

perm = np.arange(self.num_examples)

np.random.shuffle(perm)

self.data_X = self.data_X[perm]

self.data_Y = self.data_Y[perm]

# Start next epoch

start = 0

self.index_in_epoch = batch_size

assert batch_size <= self.num_examples

end = self.index_in_epoch

return self.data_X[start:end], self.data_Y[start:end]

Now, load the Ising dataset, and splits it into three subsets: ordered, critical and disordered, depending on the

temperature which sets the distribution they are drawn from. Once again, we use the ordered and disordered

data to create a training and a test data set for the problem. Classifying the states in the critical region is

expected to be harder and we only use this data to test the performance of our model in the end.

In [7]:

import pickle

from sklearn.model_selection import train_test_split

from keras.utils import to_categorical

import collections

L=40 # linear system size

# load data

fac = 25

file_name = "Ising2DFM_reSample_L40_T=All.pkl" # this file contains 16*10000 samples taken in T=np.arange(0.25,4.0001,0.25)

data = pickle.load(open(file_name,'rb')) # pickle reads the file and returns the Python object (1D array, compressed bits)

data = data[::fac]

data = np.unpackbits(data).reshape(-1, 1600) # Decompress array and reshape for convenience

data=data.astype('int')

data[np.where(data==0)]=-1 # map 0 state to -1 (Ising variable can take values +/-1)

file_name = "Ising2DFM_reSample_L40_T=All_labels.pkl" # this file contains 16*10000 samples taken in T=np.arange(0.25,4.0001,0.25)

labels = pickle.load(open(file_name,'rb')) # pickle reads the file and returns the Python object (here just a 1D array with the binary labels)

# divide data into ordered, critical and disordered

X_ordered=data[:int(70000/fac),:]

Y_ordered=labels[:70000][::fac]

X_critical=data[int(70000/fac):int(100000/fac),:]

Y_critical=labels[70000:100000][::fac]

X_disordered=data[int(100000/fac):,:]

Y_disordered=labels[100000:][::fac]

del data,labels

# define training and test data sets

X=np.concatenate((X_ordered,X_disordered)) #np.concatenate((X_ordered,X_critical,X_disordered))

Y=np.concatenate((Y_ordered,Y_disordered)) #np.concatenate((Y_ordered,Y_critical,Y_disordered))

del X_ordered, X_disordered, Y_ordered, Y_disordered

In [8]:

You can load the training data in the following way: (Dataset.train.data_X, Dataset.train.data_Y).

Steps 2+3: Define the Neural Net and its Architecture, Choose the Optimizer and

the Cost Function

We can now move on to construct our deep neural net using TensorFlow.

Unique for TensorFlow is creating placeholders for the variables of the model, such as the feed-in data X and

Y or the dropout probability dropout_keepprob (which has to be set to unity explicitly during testing).

Another peculiarity is using the with scope to give names to the most important operators. While we do not

discuss this here, TensorFlow also allows one to visualise the computational graph for the model (see package

documentation on https://www.tensorflow.org/ (https://www.tensorflow.org/)).

The shape of X is only partially defined. We know that it will be a matrix, with instances along the first

dimension and features along the second dimension, and we know that the number of features is going to be

, but we don't know yet how many instances each training batch will contain. So the shape of X is

(None, n_inputs). Similarly, we know that Y will be a vector with one entry per instance, but again we don't

know the size of the training batch, so the shape is (None).

28 × 28

# pick random data points from ordered and disordered states to create the training and test sets

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,train_size=0.6)

# make data categorical

Y_train=to_categorical(Y_train)

Y_test=to_categorical(Y_test)

Y_critical=to_categorical(Y_critical)

# create data sets

train = DataSet(X_train, Y_train, dtype=dtypes.float32)

test = DataSet(X_test, Y_test, dtype=dtypes.float32)

critical = DataSet(X_critical, Y_critical, dtype=dtypes.float32)

Datasets = collections.namedtuple('Datasets', ['train', 'test', 'critical'])

Dataset = Datasets(train=train, test=test, critical=critical)

In [9]:

To classify whether a given spin configuration is in the ordered or disordered phase, we construct a

minimalistic model for a DNN with a single hidden layer containing (which is kept variable so we can

try out the performance of different sizes for the hidden layer).

Let us use a neuron_layer() function to create layers in the neural nets.

1. First, create a name scope using the name of the layer.

2. Get the number of inputs by looking up the input matrix's shape and getting the size of the second

dimension.

3. Create a variable which holds the weight matrix (i.e. kernel). Initialize it randomly, using a truncated

normal distribution.

4. Create a variable for biases, initialized to 0.

5. Create a subgraph to compute

6. Use activation function if provided.

Nneurons

W

b

Z = XW + b

In [10]:

L=40 # system linear size

n_feats=L**2 # 40x40 square lattice

n_categories=2 # 2 Ising phases: ordered and disordered

n_hidden1 = 300

n_hidden2 = 100

n_outputs = 2

with tf.name_scope('data'):

X=tf.placeholder(tf.float32, shape=(None,n_feats))

Y=tf.placeholder(tf.float32, shape=(None,n_categories))

dropout_keepprob=tf.placeholder(tf.float32)

def neuron_layer(X, n_neuron, name, activation = None):

with tf.name_scope(name):

n_inputs = int(X.get_shape()[1])

stddev = 2 / np.sqrt(n_inputs + n_neuron)

init = tf.truncated_normal((n_inputs, n_neuron), stddev = stddev)

W = tf.Variable(init, name = "kernel")

b = tf.Variable(tf.zeros([n_neuron]), name = "bias")

Z = tf.matmul(X, W) + b

if activation is not None:

return activation(Z)

else:

return Z

Using a neuron_layer() function, create two hidden layers and an output layer. The first hidden layer takes X as

its input, and the second takes the output of the first hidden layer as its input. Finally, the output layer takes the

output of the second hidden layer as its input.

In [11]:

Then, define the cost function that we will use to train the neural net model. Here, use the cross entropy to

penalize models that estimate a low probability for the target class.

In [12]:

Then, define a GradientDescentOptimizer that will tweak the model parameters to minimize the cost function.

Now, set learning_rate = 1e-6.

In [13]:

Lastly, specify how to evaluate the model. Let us simply use accuracy as our performance measure.

In [14]:

with tf.name_scope("dnn"):

hidden1 = tf.layers.dense(X, n_hidden1, activation = tf.nn.relu)

hidden2 = tf.layers.dense(hidden1, n_hidden2, activation = tf.nn.relu)

logits = tf.layers.dense(hidden2, n_outputs)

with tf.name_scope('loss'):

xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = logits)

loss = tf.reduce_mean(xentropy)

learning_rate = 1e-6

with tf.name_scope('optimiser'):

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)


with tf.name_scope('accuracy'):

correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(logits, 1))

correct_prediction = tf.cast(correct_prediction, tf.float64) # change data type

# correct_prediction = tf.nn.in_top_k(logits, Y, 1)

accuracy = tf.reduce_mean(correct_prediction)

Steps 4+5: Train the Model and Evaluate its Performance

We train our DNN using mini-batches of size over a total of epochs, which we define first. We then

set up the optimizer parameter dictionary opt_params , and use it to create a DNN model.

Running TensorFlow requires opening up a Session which we abbreviate as sess for short. All operations

are performed in this session by calling the run method. First, we initialize the global variables in

TensorFlow's computational graph by running the global_variables_initializer . To train the DNN,

we loop over the number of epochs. In each fix epoch, we use the next_batch function of the DataSet

class we defined above to create a mini-batch. The forward and backward passes through the weights are

performed by running the loss and optimizer methods. To pass the mini-batch as well as any other

external parameters, we use the feed_dict dictionary. Similarly, we evaluate the model performance, by

getting accuracy on the same minibatch data. Note that the dropout probability for testing is set to unity.

Once we have exhausted all training epochs, we test the final performance on the entire training, test and

critical data sets. This is done in the same way as above.

Last, we return the loss and accuracy for each of the training, test and critical data sets.

100 100

In [15]:

train loss/accuracy: 0.87729853 0.5048076923076923

test loss/accuracy: 0.8700542 0.5192307692307693

crtitical loss/accuracy: 0.8785669 0.4975

training_epochs=100

batch_size=100

with tf.Session() as sess:

# initialize the necessary variables, in this case, w and b

sess.run(tf.global_variables_initializer())

# train the DNN

for epoch in range(training_epochs):

batch_X, batch_Y = Dataset.train.next_batch(batch_size)

sess.run(optimizer, feed_dict={X: batch_X,Y: batch_Y,dropout_keepprob: 0.5})


# test DNN performance on entire train test and critical data sets

train_loss, train_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.train.data_X,

Y: Dataset.train.data_Y,

dropout_keepprob: 0.5}

)

print("train loss/accuracy:", train_loss, train_accuracy)

test_loss, test_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.test.data_X,

Y: Dataset.test.data_Y,

dropout_keepprob: 1.0}

)

print("test loss/accuracy:", test_loss, test_accuracy)

critical_loss, critical_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.critical.data_X

Y: Dataset.critical.data_Y

dropout_keepprob: 1.0}

)

print("crtitical loss/accuracy:", critical_loss, critical_accuracy)

Step 6: Modify the Hyperparameters to Optimize Performance of the Model

To study the dependence of our DNN on some of the hyperparameters, we do a grid search over the number of

neurons (initially set as 100) in the hidden layer, and different SGD learning rates (initially set as 1e-6). These

searches are best done over logarithmically-spaced points.

To do this, define a function for creating a DNN model: create_DNN and for evaluating the performance:

evaluate_model .

The function grid_search will output 2D heat map to show how accuracy changes with learning rate and

number of neurons.

In [16]:

def create_DNN(n_hidden1=100, n_hidden2=100, learning_rate=1e-6):

with tf.name_scope('data'):

X=tf.placeholder(tf.float32, shape=(None,n_feats))

Y=tf.placeholder(tf.float32, shape=(None,n_categories))

dropout_keepprob=tf.placeholder(tf.float32)

with tf.name_scope("dnn"):

hidden1 = tf.layers.dense(X, n_hidden1, activation = tf.nn.relu)

hidden2 = tf.layers.dense(hidden1, n_hidden2, activation = tf.nn.relu)

logits = tf.layers.dense(hidden2, n_outputs)


with tf.name_scope('loss'):

xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = logits

loss = tf.reduce_mean(xentropy)


with tf.name_scope('optimiser'):

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

with tf.name_scope('accuracy'):

correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(logits, 1))

correct_prediction = tf.cast(correct_prediction, tf.float64) # change data type

# correct_prediction = tf.nn.in_top_k(logits, Y, 1)

accuracy = tf.reduce_mean(correct_prediction)


return X, Y, dropout_keepprob, loss, optimizer, accuracy

In [17]:

def evaluate_model(neurons,lr):


training_epochs=100

batch_size=100

X, Y, dropout_keepprob, loss, optimizer, accuracy = create_DNN(n_hidden1=neurons

with tf.Session() as sess:

# initialize the necessary variables, in this case, w and b

sess.run(tf.global_variables_initializer())

# train the DNN

for epoch in range(training_epochs):

batch_X, batch_Y = Dataset.train.next_batch(batch_size)

sess.run(optimizer, feed_dict={X: batch_X,Y: batch_Y,dropout_keepprob: 0.5

# test DNN performance on entire train test and critical data sets

train_loss, train_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.train.data_X

Y: Dataset.train.data_Y

dropout_keepprob: 0.5

)

print("train loss/accuracy:", train_loss, train_accuracy)

test_loss, test_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.test.data_X

Y: Dataset.test.data_Y

dropout_keepprob: 1.0

)

print("test loss/accuracy:", test_loss, test_accuracy)

critical_loss, critical_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.critical.data_X

Y: Dataset.critical.data_Y

dropout_keepprob: 1.0

)

print("crtitical loss/accuracy:", critical_loss, critical_accuracy)

return train_loss,train_accuracy,test_loss,test_accuracy,critical_loss,critical_accuracy

In [18]:

def grid_search():

"""This function performs a grid search over a set of different learning rates

and a number of hidden layer neurons."""

# perform grid search over learnign rate and number of hidden neurons

N_neurons=[100, 200, 300, 400, 500]

learning_rates=np.logspace(-6,-1,6)

# pre-alocate variables to store accuracy and loss data

train_loss=np.zeros((len(N_neurons),len(learning_rates)),dtype=np.float64)

train_accuracy=np.zeros_like(train_loss)

test_loss=np.zeros_like(train_loss)

test_accuracy=np.zeros_like(train_loss)

critical_loss=np.zeros_like(train_loss)

critical_accuracy=np.zeros_like(train_loss)

# do grid search

for i, neurons in enumerate(N_neurons):

for j, lr in enumerate(learning_rates):

print("training DNN with %4d neurons and SGD lr=%0.6f." %(neurons,lr) )

train_loss[i,j],train_accuracy[i,j],\

test_loss[i,j],test_accuracy[i,j],\

critical_loss[i,j],critical_accuracy[i,j] = evaluate_model(neurons,lr)

plot_data(learning_rates,N_neurons,train_accuracy, "training data")

plot_data(learning_rates,N_neurons,test_accuracy, "test data")

plot_data(learning_rates,N_neurons,critical_accuracy, "critical data")

In [19]:

In [20]:

training DNN with 100 neurons and SGD lr=0.000001.

train loss/accuracy: 0.8261823 0.5150641025641025

test loss/accuracy: 0.82548255 0.5264423076923077

crtitical loss/accuracy: 0.823629 0.5083333333333333

training DNN with 100 neurons and SGD lr=0.000010.

train loss/accuracy: 0.7448887 0.5432692307692307

%matplotlib notebook

import matplotlib.pyplot as plt

def plot_data(x,y,data, title):

# plot results

fontsize=16

fig = plt.figure()

ax = fig.add_subplot(111)

cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)

fig.colorbar(cax)

# put text on matrix elements

for i, x_val in enumerate(np.arange(len(x))):

for j, y_val in enumerate(np.arange(len(y))):

c = "${0:.1f}\\%$".format( 100*data[j,i])

ax.text(x_val, y_val, c, va='center', ha='center')

# convert axis vaues to to string labels

x=[str(i) for i in x]

y=[str(i) for i in y]

ax.set_xticklabels(['']+x)

ax.set_yticklabels(['']+y)

ax.set_xlabel('$\\mathrm{learning\\ rate}$',fontsize=fontsize)

ax.set_ylabel('$\\mathrm{hidden\\ neurons}$',fontsize=fontsize)


ax.set_title(title,fontsize=fontsize)

plt.tight_layout()

plt.show()

grid_search()

test loss/accuracy: 0.7627646 0.5201923076923077

crtitical loss/accuracy: 0.70823133 0.57

training DNN with 100 neurons and SGD lr=0.000100.

train loss/accuracy: 0.7658607 0.5166666666666667

test loss/accuracy: 0.77821404 0.5153846153846153

crtitical loss/accuracy: 0.7970336 0.49083333333333334

training DNN with 100 neurons and SGD lr=0.001000.

train loss/accuracy: 0.66140896 0.5939102564102564

test loss/accuracy: 0.6775087 0.5822115384615385

crtitical loss/accuracy: 0.72388476 0.5258333333333334

training DNN with 100 neurons and SGD lr=0.010000.

train loss/accuracy: 0.22135256 0.9724358974358974

test loss/accuracy: 0.26849487 0.9456730769230769

crtitical loss/accuracy: 0.48565084 0.7841666666666667

training DNN with 100 neurons and SGD lr=0.100000.

train loss/accuracy: 0.0065746326 1.0

test loss/accuracy: 0.032350723 0.99375

crtitical loss/accuracy: 0.46474758 0.8083333333333333

training DNN with 200 neurons and SGD lr=0.000001.

train loss/accuracy: 0.81956315 0.48205128205128206

test loss/accuracy: 0.83524406 0.47307692307692306

crtitical loss/accuracy: 0.7584602 0.5358333333333334

training DNN with 200 neurons and SGD lr=0.000010.

train loss/accuracy: 1.0758603 0.5407051282051282

test loss/accuracy: 1.1356921 0.5158653846153847

crtitical loss/accuracy: 0.84251535 0.65

training DNN with 200 neurons and SGD lr=0.000100.

train loss/accuracy: 0.81050164 0.5112179487179487

test loss/accuracy: 0.82746327 0.5033653846153846

crtitical loss/accuracy: 0.7245159 0.5866666666666667

training DNN with 200 neurons and SGD lr=0.001000.

train loss/accuracy: 0.68042535 0.6128205128205129

test loss/accuracy: 0.69521517 0.5947115384615385

crtitical loss/accuracy: 0.6811237 0.6025

training DNN with 200 neurons and SGD lr=0.010000.

train loss/accuracy: 0.1754437 0.9875

test loss/accuracy: 0.223725 0.95625

crtitical loss/accuracy: 0.4759194 0.7783333333333333

training DNN with 200 neurons and SGD lr=0.100000.

train loss/accuracy: 0.0050047515 1.0

test loss/accuracy: 0.016949823 0.9980769230769231

crtitical loss/accuracy: 0.37466472 0.8475

training DNN with 300 neurons and SGD lr=0.000001.

train loss/accuracy: 1.0666369 0.4512820512820513

test loss/accuracy: 1.0449538 0.45913461538461536

crtitical loss/accuracy: 1.1571132 0.365

training DNN with 300 neurons and SGD lr=0.000010.

train loss/accuracy: 1.2051065 0.4532051282051282

test loss/accuracy: 1.1894196 0.4625

crtitical loss/accuracy: 1.4901551 0.3383333333333333

training DNN with 300 neurons and SGD lr=0.000100.

train loss/accuracy: 0.7737522 0.49455128205128207

test loss/accuracy: 0.7733282 0.4932692307692308

crtitical loss/accuracy: 0.7616569 0.5058333333333334

training DNN with 300 neurons and SGD lr=0.001000.

train loss/accuracy: 0.66356003 0.6256410256410256

test loss/accuracy: 0.68873733 0.604326923076923

crtitical loss/accuracy: 0.69861454 0.5808333333333333

training DNN with 300 neurons and SGD lr=0.010000.

train loss/accuracy: 0.17167042 0.9887820512820513

test loss/accuracy: 0.21518518 0.9629807692307693

crtitical loss/accuracy: 0.4357977 0.8208333333333333

training DNN with 300 neurons and SGD lr=0.100000.

train loss/accuracy: 0.0050944365 1.0

test loss/accuracy: 0.014383625 0.9990384615384615

crtitical loss/accuracy: 0.3503038 0.8433333333333334

training DNN with 400 neurons and SGD lr=0.000001.

train loss/accuracy: 1.2070509 0.45

test loss/accuracy: 1.172588 0.4735576923076923

crtitical loss/accuracy: 1.4149262 0.33666666666666667

training DNN with 400 neurons and SGD lr=0.000010.

train loss/accuracy: 0.8229826 0.492948717948718

test loss/accuracy: 0.83668387 0.4894230769230769

crtitical loss/accuracy: 0.83969796 0.46416666666666667

training DNN with 400 neurons and SGD lr=0.000100.

train loss/accuracy: 0.7795468 0.5051282051282051

test loss/accuracy: 0.77486354 0.5081730769230769

crtitical loss/accuracy: 0.74270254 0.5308333333333334

training DNN with 400 neurons and SGD lr=0.001000.

train loss/accuracy: 0.6276471 0.6432692307692308

test loss/accuracy: 0.65891 0.6048076923076923

crtitical loss/accuracy: 0.6667839 0.6041666666666666

training DNN with 400 neurons and SGD lr=0.010000.

train loss/accuracy: 0.16308242 0.9881410256410257

test loss/accuracy: 0.1974595 0.9735576923076923

crtitical loss/accuracy: 0.42789352 0.82

training DNN with 400 neurons and SGD lr=0.100000.

train loss/accuracy: 0.0045278403 1.0

test loss/accuracy: 0.013034376 0.9985576923076923

crtitical loss/accuracy: 0.36393955 0.845

training DNN with 500 neurons and SGD lr=0.000001.

train loss/accuracy: 0.8089994 0.49230769230769234

test loss/accuracy: 0.8481708 0.4673076923076923

crtitical loss/accuracy: 0.74246925 0.5383333333333333

training DNN with 500 neurons and SGD lr=0.000010.

train loss/accuracy: 0.7743357 0.5041666666666667

test loss/accuracy: 0.7632512 0.5221153846153846

crtitical loss/accuracy: 0.7739148 0.5258333333333334

training DNN with 500 neurons and SGD lr=0.000100.

train loss/accuracy: 0.77521557 0.5048076923076923

test loss/accuracy: 0.78184414 0.5004807692307692

crtitical loss/accuracy: 0.74329823 0.5258333333333334

training DNN with 500 neurons and SGD lr=0.001000.

train loss/accuracy: 0.6239121 0.6592948717948718

test loss/accuracy: 0.62496156 0.6615384615384615

crtitical loss/accuracy: 0.6445728 0.6316666666666667

training DNN with 500 neurons and SGD lr=0.010000.

train loss/accuracy: 0.14266635 0.9942307692307693

test loss/accuracy: 0.18611857 0.9764423076923077

crtitical loss/accuracy: 0.43260914 0.8108333333333333

training DNN with 500 neurons and SGD lr=0.100000.

train loss/accuracy: 0.0048845564 1.0

test loss/accuracy: 0.012268903 0.9990384615384615

crtitical loss/accuracy: 0.35715875 0.825

1. Do the grid search over 5 different types of activation functions

(https://www.tensorflow.org/api_guides/python/nn#Activation_Functions)

(https://www.tensorflow.org/api_guides/python/nn#Activation_Functions)). Evaluate the performance for each

case and determine which gives the best accuracy. You can assume an arbitrary DNN. Show results for training,

test, and critical data.

In [33]:

def create_DNN(activation):

with tf.name_scope('data'):

X=tf.placeholder(tf.float32, shape=(None,n_feats))

Y=tf.placeholder(tf.float32, shape=(None,n_categories))

dropout_keepprob=tf.placeholder(tf.float32)

with tf.name_scope("dnn"):

if activation == 0:

hidden1 = tf.layers.dense(X, 100, activation = tf.nn.relu)

hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.relu)

elif activation == 1:

hidden1 = tf.layers.dense(X, 100, activation = tf.nn.relu6)

hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.relu6)

elif activation == 2:

hidden1 = tf.layers.dense(X, 100, activation = tf.nn.crelu)

hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.crelu)

elif activation == 3:

hidden1 = tf.layers.dense(X, 100, activation = tf.nn.elu)

hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.elu)

elif activation == 4:

hidden1 = tf.layers.dense(X, 100, activation = tf.nn.tanh)

hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.tanh)

logits = tf.layers.dense(hidden2, n_outputs)


with tf.name_scope('loss'):

xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = logits

loss = tf.reduce_mean(xentropy)


with tf.name_scope('optimiser'):

optimizer = tf.train.GradientDescentOptimizer(1e-6).minimize(loss)

with tf.name_scope('accuracy'):

with tf.name_scope('accuracy'):

correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(logits, 1))

correct_prediction = tf.cast(correct_prediction, tf.float64) # change data type

# correct_prediction = tf.nn.in_top_k(logits, Y, 1)

accuracy = tf.reduce_mean(correct_prediction)


return X, Y, dropout_keepprob, loss, optimizer, accuracy

def evaluate_model(activation):


training_epochs=100

batch_size=100

X, Y, dropout_keepprob, loss, optimizer, accuracy = create_DNN(activation)

with tf.Session() as sess:

# initialize the necessary variables, in this case, w and b

sess.run(tf.global_variables_initializer())

# train the DNN

for epoch in range(training_epochs):

batch_X, batch_Y = Dataset.train.next_batch(batch_size)

sess.run(optimizer, feed_dict={X: batch_X,Y: batch_Y,dropout_keepprob: 0.5

# test DNN performance on entire train test and critical data sets

train_loss, train_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.train.data_X

Y: Dataset.train.data_Y

dropout_keepprob: 0.5

)

print("train loss/accuracy:", train_loss, train_accuracy)

test_loss, test_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.test.data_X

Y: Dataset.test.data_Y

dropout_keepprob: 1.0

)

print("test loss/accuracy:", test_loss, test_accuracy)

critical_loss, critical_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.critical.data_X

Y: Dataset.critical.data_Y

dropout_keepprob: 1.0

)

print("crtitical loss/accuracy:", critical_loss, critical_accuracy)

return train_loss,train_accuracy,test_loss,test_accuracy,critical_loss,critical_accuracy

activation=['relu', 'relu6', 'crelu', 'elu', 'tanh']

activation=['relu', 'relu6', 'crelu', 'elu', 'tanh']

def grid_search():

"""This function performs a grid search over a set of different learning rates

and a number of hidden layer neurons."""

# perform grid search over learnign rate and number of hidden neurons

activation=['relu', 'relu6', 'crelu', 'elu', 'tanh']


# pre-alocate variables to store accuracy and loss data

train_loss=np.zeros(len(activation))

train_accuracy=np.zeros_like(train_loss)

test_loss=np.zeros_like(train_loss)

test_accuracy=np.zeros_like(train_loss)

critical_loss=np.zeros_like(train_loss)

critical_accuracy=np.zeros_like(train_loss)


# do grid search

for i in range(np.size(activation)):

print("training DNN with %s: " %(activation[i]) )

train_loss[i],train_accuracy[i],test_loss[i],test_accuracy[i],critical_loss[

print()


temp = critical_accuracy.tolist()

temp = temp.index(max(temp))

print(activation[temp], 'gives the best accuracy, and the best accuracy is: ',critical_accuracy

In [34]:

2. Do the grid search over 5 different numbers of epochs and batch sizes. Make a 2D heat map as shown in the

example. You can assume an arbitrary DNN. Show results for training, test, and critical data.

In [43]:

training DNN with relu:

train loss/accuracy: 0.759728 0.5301282051282051

test loss/accuracy: 0.7460091 0.5379807692307692

crtitical loss/accuracy: 0.75928587 0.5266666666666666

training DNN with relu6:

train loss/accuracy: 0.8069537 0.525

test loss/accuracy: 0.8278656 0.49759615384615385

crtitical loss/accuracy: 0.7280958 0.5775

training DNN with crelu:

train loss/accuracy: 1.2907476 0.5477564102564103

test loss/accuracy: 1.3735441 0.5254807692307693

crtitical loss/accuracy: 1.0095135 0.6658333333333334

training DNN with elu:

train loss/accuracy: 0.90077466 0.5060897435897436

test loss/accuracy: 0.9300092 0.48846153846153845

crtitical loss/accuracy: 0.85720867 0.525

training DNN with tanh:

train loss/accuracy: 0.82476604 0.4987179487179487

test loss/accuracy: 0.8346278 0.49615384615384617

crtitical loss/accuracy: 0.7826468 0.5225

crelu gives the best accuracy, and the best accuracy is: 0.6658333333

333334

grid_search()

#question 2

def create_DNN():

with tf.name_scope('data'):

X=tf.placeholder(tf.float32, shape=(None,n_feats))

Y=tf.placeholder(tf.float32, shape=(None,n_categories))

dropout_keepprob=tf.placeholder(tf.float32)

with tf.name_scope("dnn"):

hidden1 = tf.layers.dense(X, 100, activation = tf.nn.relu)

hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.relu)

hidden2 = tf.layers.dense(hidden1, 100, activation = tf.nn.relu)

logits = tf.layers.dense(hidden2, n_outputs)


with tf.name_scope('loss'):

xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = logits

loss = tf.reduce_mean(xentropy)


with tf.name_scope('optimiser'):

optimizer = tf.train.GradientDescentOptimizer(1e-6).minimize(loss)

with tf.name_scope('accuracy'):

correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(logits, 1))

correct_prediction = tf.cast(correct_prediction, tf.float64) # change data type

# correct_prediction = tf.nn.in_top_k(logits, Y, 1)

accuracy = tf.reduce_mean(correct_prediction)


return X, Y, dropout_keepprob, loss, optimizer, accuracy

def evaluate_model(training_epochs,batch_size):


X, Y, dropout_keepprob, loss, optimizer, accuracy = create_DNN()

with tf.Session() as sess:

# initialize the necessary variables, in this case, w and b

sess.run(tf.global_variables_initializer())

# train the DNN

for epoch in range(training_epochs):

batch_X, batch_Y = Dataset.train.next_batch(batch_size)

sess.run(optimizer, feed_dict={X: batch_X,Y: batch_Y,dropout_keepprob: 0.5

# test DNN performance on entire train test and critical data sets

train_loss, train_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.train.data_X

Y: Dataset.train.data_Y

dropout_keepprob: 0.5

)

print("train loss/accuracy:", train_loss, train_accuracy)

test_loss, test_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.test.data_X

Y: Dataset.test.data_Y

dropout_keepprob: 1.0

)

print("test loss/accuracy:", test_loss, test_accuracy)

critical_loss, critical_accuracy = sess.run([loss, accuracy],

feed_dict={X: Dataset.critical.data_X

Y: Dataset.critical.data_Y

Y: Dataset.critical.data_Y

dropout_keepprob: 1.0

)

print("crtitical loss/accuracy:", critical_loss, critical_accuracy)

return train_loss,train_accuracy,test_loss,test_accuracy,critical_loss,critical_accuracy

def grid_search():

"""This function performs a grid search over a set of different learning rates

and a number of hidden layer neurons."""

# perform grid search over learnign rate and number of hidden neurons

training_epochs = [100, 200, 300, 400, 500, 600]

batch_size = [100, 200, 300, 400, 500, 600]

# pre-alocate variables to store accuracy and loss data

train_loss=np.zeros((len(training_epochs),len(batch_size)),dtype=np.float64)

train_accuracy=np.zeros_like(train_loss)

test_loss=np.zeros_like(train_loss)

test_accuracy=np.zeros_like(train_loss)

critical_loss=np.zeros_like(train_loss)

critical_accuracy=np.zeros_like(train_loss)

# do grid search

for i, trainingepochs in enumerate(training_epochs):

for j, batchsize in enumerate(batch_size):

print("training DNN with %d training epochs and SGD batch size=%d." %(trainingepochs

train_loss[i,j],train_accuracy[i,j],\

test_loss[i,j],test_accuracy[i,j],\

critical_loss[i,j],critical_accuracy[i,j] = evaluate_model(trainingepochs


print()

plot_data(batch_size,training_epochs,train_accuracy, "training data")

plot_data(batch_size,training_epochs,test_accuracy, "test data")

plot_data(batch_size,training_epochs,critical_accuracy, "critical data")

%matplotlib notebook

import matplotlib.pyplot as plt

def plot_data(x,y,data, title):

# plot results

fontsize=16

fig = plt.figure()

ax = fig.add_subplot(111)

cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)

fig.colorbar(cax)

In [44]:

training DNN with 100 training epochs and SGD batch size=100.

train loss/accuracy: 0.9047832 0.521474358974359

test loss/accuracy: 0.92370594 0.5033653846153846

crtitical loss/accuracy: 0.71819884 0.6408333333333334

training DNN with 100 training epochs and SGD batch size=200.

train loss/accuracy: 0.75898373 0.5176282051282052

test loss/accuracy: 0.7683781 0.5100961538461538

crtitical loss/accuracy: 0.7331881 0.5391666666666667

training DNN with 100 training epochs and SGD batch size=300.

train loss/accuracy: 0.8795474 0.5637820512820513

test loss/accuracy: 0.8970626 0.541826923076923

crtitical loss/accuracy: 0.7265906 0.6533333333333333

training DNN with 100 training epochs and SGD batch size=400.

train loss/accuracy: 1.0428126 0.4721153846153846

test loss/accuracy: 1.0210873 0.47115384615384615

crtitical loss/accuracy: 1.175693 0.36

# put text on matrix elements

for i, x_val in enumerate(np.arange(len(x))):

for j, y_val in enumerate(np.arange(len(y))):

c = "${0:.1f}\\%$".format( 100*data[j,i])

ax.text(x_val, y_val, c, va='center', ha='center')

# convert axis vaues to to string labels

x=[str(i) for i in x]

y=[str(i) for i in y]

ax.set_xticklabels(['']+x)

ax.set_yticklabels(['']+y)

ax.set_xlabel('$\\mathrm{batch\\ size}$',fontsize=fontsize)

ax.set_ylabel('$\\mathrm{training\\ epochs}$',fontsize=fontsize)


ax.set_title(title,fontsize=fontsize)

plt.tight_layout()

plt.show()

grid_search()

Problem 3 - SDSS galaxies

You should restart the kernel for Problem 2.

The data is provided in the file "specz_data.txt". The columns of the file (length of 13) correspond to -

spectroscopic redshift ('zspec'), RA, DEC, magnitudes in 5 bands - u, g, r, i, z (denoted as 'mu,' 'mg,' 'mr,' 'mi,'

'mz' respectively); Exponential and de Vaucouleurs model magnitude fits ('logExp' and 'logDev'

http://www.sdss.org/dr12/algorithms/magnitudes/) (http://www.sdss.org/dr12/algorithms/magnitudes/)); zebra

fit ('pz_zebra); Neural Network fit ('pz_NN') and its error estimate ('pz_NN_Err')

We will undertake 2 exercises -

Regression

We will use the magnitude of object in different bands ('mu, mg, mr, mi, mz') and do a regression

exercise to estimate the redshift of the object. Hence our feature space is 5.

The correct redshift is given by 'zspec', which is the spectroscopic redshift of the object. We will use

this for training and testing purpose.

Sidenote: Photometry vs. Spectroscopy

The amount of energy we receive from celestial objects – in the form of radiation – is called the flux,

and an astro- nomical technique of measuring the flux is photometry. Flux is usually measured over

broad wavelength bands, and with the estimate of the distance to an object, it can infer the object’s

luminosity, temperature, size, etc. Usually light is passed through colored filters, and we measure the

intensity of the filtered light.

On the other hand, spectroscopy deals with the spectrum of the emitted light. This tells us what the

object is made of, how it is moving, the pressure of the material in it, etc. Note that for faint objects

making photometric observation is much easier.

Photometric redshift (photoz) is an estimate of the distance to the object using photometry.

Spectroscopic redshift observes the object’s spectral lines and measures their shifts due to the

Doppler effect to infer the distance.

Classification

We will use the same magnitudes and now also the redshift of the object ('zspec') to classify the

object as either Elleptical or Spiral. Hence our feature space is now 6.

The correct class is given by compring 'logExp' and 'logDev' which are the fits for Exponential and

Devocular profiles. If logExp > logDev, its a spiral and vice-versa. We will use this for training and

testing purpose. Since the classes are not explicitly given, generate a column for those (Classes can

be ±1. If it is 0, it does not belong to either of the class.)

In [ ]:

Cleaning

Read in the files to create the data (X and Y) for both regression and classification.

You will have to clean the data -

Drop the entries that are nan or infinite

Drop the unrealistic numbers such as 999, -999; and magnitudes that are unrealistic. Since these are

absolute magnitudes, they should be positive and high. Lets choose a magnitude limit of 15 as safe bet.

For classification, drop the entries that do not belong to either of the class

In [4]:

dict_keys(['zspec', 'RA', 'DEC', 'mu', 'mg', 'mr', 'mi', 'mz', 'logExp

', 'logDev', 'pz_zebra', 'pz_NN', 'pz_NN_Err'])

#Read in and create data

fname = 'specz_data.txt'

spec_dat=np.genfromtxt(fname,names=True)

print(spec_dat.dtype.fields.keys())

#convenience variable

zspec = spec_dat['zspec']

pzNN = spec_dat['pz_NN']

#some N redshifts are not defined

pzNN[pzNN < 0] = np.nan

#For Regression

bands = ['u', 'g', 'r','i', 'z' ]

mlim = 15

xdata = np.concatenate([[spec_dat['m%s'%i] for i in bands]]).T

bad = (xdata[:, 0] < mlim) | (xdata[:, 1] < mlim) | (xdata[:, 2] < mlim) & (xdata[:,

xdata = xdata[~bad]

xdata[xdata<0] = 0

ydata = zspec[~bad]

#For classification

classes = np.sign(spec_dat['logExp'] - spec_dat['logDev'])

tmp = np.concatenate([[spec_dat['m%s'%i] for i in bands]]).T

xxdata = np.concatenate([tmp, zspec.reshape(-1, 1)], axis=1)

bad = (classes==0) | (xxdata[:, 0] < mlim) | (xxdata[:, 1] < mlim) | (xxdata[:, 2] <

xxdata = xxdata[~bad]

classes = classes[~bad]

For regression, X and Y data (called "xdata" and "ydata," respectively) is cleaned magnitudes (5 feature space)

and spectroscopic redshifts respectively. For classification, X and Y data (called "xxdata" and "classes"

respectively) is cleaned magnitudes+spectroscopic redshifts respectively (6 feature space) and classees

respectively.

In [5]:

Visualization

The next step should be to visualize the data.

For regression

Make a histogram for the distribution of the data (spectroscopic redshift).

Make 5 2D histograms of the distribution of the magnitude as function of redshift (Hint:

https://matplotlib.org/devdocs/api/_as_gen/matplotlib.axes.Axes.hist2d.html

(https://matplotlib.org/devdocs/api/_as_gen/matplotlib.axes.Axes.hist2d.html))

For classification

Make 6 1-d histogram for the distribution of the data (6 features - zspec and 5 magnitudes) for both class

1 and -1 separately

1. Make histograms for both regression and classification.

In [6]:

For Regression:

Before: Size of datasets is 5338

After: Size of datasets is 4535

For Classification:

Before: Size of datasets is 5338

After: Size of datasets is 4147

print('For Regression:')

print('Before: Size of datasets is ', zspec.shape[0])

print('After: Size of datasets is ', xdata.shape[0])

print('')

print('For Classification:')

print('Before: Size of datasets is ', zspec.shape[0])

print('After: Size of datasets is ', xxdata.shape[0])

plt.hist(ydata, bins = 50, rwidth = 0.8)

plt.xlabel('Redshift')

plt.ylabel('Amount')

plt.title('Distribution of the spectroscopic redshift')

plt.show()

plt.show()

plt.hist2d(ydata, xdata[:,0], bins = 100)

plt.xlabel('Redshift')

plt.ylabel('Mu')

plt.title('Distribution of the Mu-Redshift')

plt.show()

plt.hist2d(ydata, xdata[:,1], bins = 100)

plt.xlabel('Redshift')

plt.ylabel('Mg')

plt.title('Distribution of the Mg-Redshift')

plt.show()

plt.hist2d(ydata, xdata[:,2], bins = 100)

plt.xlabel('Redshift')

plt.ylabel('Mr')

plt.title('Distribution of the Mr-Redshift')

plt.show()

plt.hist2d(ydata, xdata[:,3], bins = 100)

plt.xlabel('Redshift')

plt.ylabel('Mi')

plt.title('Distribution of the Mi-Redshift')

plt.show()

plt.hist2d(ydata, xdata[:,4], bins = 100)

plt.xlabel('Redshift')

plt.ylabel('Mz')

plt.title('Distribution of the Mz-Redshift')

plt.show()


In [7]:

In [8]:

separate0 = []

separate1 = []

for i in range(np.size(classes)):

if classes[i] == -1:

separate0.append(xxdata[i])

elif classes[i] == 1:

separate1.append(xxdata[i])

separate0 = np.array(separate0)

separate1 = np.array(separate1)

fig, axes = plt.subplots(2,3,figsize = (15,10))

ax = axes[0,0]

ax.hist(separate0[:,0], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mu')

ax.set_ylabel('Amount')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mu')

ax = axes[0,1]

ax.hist(separate0[:,1], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mg')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mg')

ax = axes[0,2]

ax.hist(separate0[:,2], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mr')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mr')

ax = axes[1,0]

ax.hist(separate0[:,3], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mi')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mi')

ax = axes[1,1]

ax.hist(separate0[:,4], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mz')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mz')

ax = axes[1,2]

ax.hist(separate0[:,5], bins = 50, rwidth = 0.8)

ax.set_xlabel('Redshift')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the spectroscopic redshift')

plt.title('Distribution of the data for class -1')

plt.show()

In [9]:

fig, axes = plt.subplots(2,3,figsize = (15,10))

ax = axes[0,0]

ax.hist(separate1[:,0], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mu')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mu')

ax = axes[0,1]

ax.hist(separate1[:,1], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mg')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mg')

ax = axes[0,2]

ax.hist(separate1[:,2], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mr')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mr')

ax = axes[1,0]

ax.hist(separate1[:,3], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mi')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mi')

ax = axes[1,1]

ax.hist(separate1[:,4], bins = 50, rwidth = 0.8)

ax.set_xlabel('Mz')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the Mz')

ax = axes[1,2]

ax.hist(separate1[:,5], bins = 50, rwidth = 0.8)

ax.set_xlabel('Redshift')

ax.set_ylabel('Amount')

ax.set_title('Distribution of the spectroscopic redshift')

plt.title('Distribution of the data for class 1')

plt.show()


2. Do the following preprocessing:

Preprocessing:

Next, split the sample into training data and the testing data. We will be using the training data to train

different algorithms and then compare the performance over the testing data. In this project, keep 80%

data as training data and uses the remaining 20% data for testing.

Often, the data can be ordered in a specific manner, hence shuffle the data prior to splitting it into training

and testing samples.

Many algorithms are also not scale invariant, and hence scale the data (different features to a uniform

scale). All this comes under preprocessing the data. http://scikitlearn.org/stable/modules/preprocessing.html#preprocessing

(http://scikitlearn.org/stable/modules/preprocessing.html#preprocessing)

Use StandardScaler from sklearn (or write your own routine) to center the data to 0 mean and 1 variance.

Note that you only center the training data and then use its mean and variance to scale the testing data

before using it.

Hint: How to get a scaled training data:

1. Let the training data be: train = ("training X data", "training Y data")

2. You can first define a StandardScaler:

scale_xdata, scale_ydata = preprocessing.StandardScaler(), preprocessing.StandardScaler()

3. Then, do the fit:

for regression: scale_xdata.fit(train_regression[0]), scale_ydata.fit(train_regression[1].reshape(-1, 1))

for classication: scale_xdata.fit(train_classification[0])

Here, no need to fit for y data for classification (it's either +1 or -1. Already scaled)

4. Next, transform:

for regression: scaled_train_data = (scale_xdata.fit_transform(train_regression[0]),

scale_ydata.fit_transform(train_regression[1].reshape(-1, 1)))

for classication: scaled_train_data = (scale_xdata.fit_transform(train_classification[0]),

train_classification[1])

Again, y data is already scaled for classification.

Do this for test data as well.

In [10]:

from sklearn import preprocessing

In [11]:

Metrics

The last remaining preperatory step is to write metric for gauging the performance of the algorithm. Write a

function to calculate the 'RMS' error given (y_predict, y_truth) to gauge regression and another function to

evaluate accuracy of classification.

In addition, for classification, we will also use confusion matrix.

Below is an example you can use. Feel free to write you own.

In [72]:

Out[11]:

StandardScaler(copy=True, with_mean=True, with_std=True)

from sklearn.model_selection import train_test_split

X_train_regression, X_test_regression, Y_train_regression, Y_test_regression = train_test_split

X_train_classification, X_test_classification, Y_train_classification, Y_test_classification

scale_xdata, scale_ydata = preprocessing.StandardScaler(),preprocessing.StandardScaler

scale_xdata.fit(X_train_regression)

scale_ydata.fit(Y_train_regression.reshape(-1, 1))

from sklearn.metrics import confusion_matrix

def rms(x, y, scale1=None, scale2=None):

'''Calculate the rms error given the truth and the prediction

'''

mask = np.isfinite(x[:]) & np.isfinite(y[:])

if scale1 is not None:

x= scale1.inverse_transform(x)

if scale2 is not None:

y = scale2.inverse_transform(y)

return np.sqrt(np.mean((x[mask] - y[mask]) ** 2))

def acc(x, y):

'''Calculate the accuracy given the truth and the prediction

'''

mask = np.isfinite(x[:]) & np.isfinite(y[:])

return (x == y).sum()/x.size

Hyperparameter method

Now, we will be varying hyperparameters to get the best model and build some intuition. There are various

ways to do this and we will use Grid Search methodology (as you did in Problem 1 and 2) which simply tries all

the combinations along with some cross-validation scheme. For most part, we will use 4-fold cross validation.

Sklearn provides GridSearchCV functionality for this purpose.

Its recommended to spend some time to go through output format of GridSearchCV and write some utility

functions to make the recurring plots for every parameter.

Grid Search returns a dictionary with self explanatory keys for the most part. Mostly, the keys correspond to

(masked) numpy arrays of size = #(all possible combination of parameters). The value of individual parameter in

every combination is given in arrays with keys starting from 'param_*' and this should help you to match the

combination with the corresponding scores.

For masked arrays, you can access the data values by using *.data

Do not overwrite these grid search-ed variables (and not only their result) since we will compare all the models

together in the end

In [73]:

Method 1. k Nearest Neighbors

For regression, let us play with grid search using knn to tune hyperparmeters. (https://scikitlearn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html

(https://scikitlearn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html))

Consider the following 3

hyperparameters -

Number of neighbors ([2, 3, 5, 10, 15, 20, 25, 50, 100])

Weights of leaves (Uniform or Inverse Distance weighing)

Distance metric (Eucledian or Manhattan distance - parameter 'p')

1. Do a grid search on these parameters. List the combination of hyperparameters you tried and evaluate the

accuracy (mean test score) and its standard deviation. Which gives the highest accuracy value?

In [74]:

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

# http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

# http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

from sklearn.neighbors import KNeighborsRegressor

Hint: (Read the documentations carefully for more detail.)

First, define the hyperparameters: parameters = {'n_neighbors':[2, 3, 5, 10, 15, 20, 25, 50, 100], 'weights':

['uniform', 'distance'], 'p':[1, 2]}

Specify the algorithm you want to use: e.g. knnr = KNeighborsRegressor()

Then, Do a grid search on these parameters using 4 fold cross validation: gcknn = GridSearchCV(knnr,

parameters, cv=4)

Do the fit: gcknn.fit(*scaled_training_data)

(Let "scaled_training_data" be the training data where "scaled_training_data = ("train X data", "train Y data")"

Get results:

has the following dictionaries: "rank_test_score," "mean_test_score," "std_test_score," and

"params" (See http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

(http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html))

Then, you can evaluate the models based on "rank_test_score" and print out their "params," along with their

"mean_test_score" and "std_test_score".

results = gcknn.cv_results_

cv_results_

In [ ]:

2. Also print out fitting and scoring times for all hyperparameter combinations.

Plot timings for fitting and scoring

Hint: Assume that you got results from:

Then, get the scoring time: results['mean_score_time']

and the fitting time: results['mean_fit_time']

results = gcknn.cv_results_

In [ ]:

...

...

3. Based on the results you obtained in Part 1 and 2, answer the following questions

Is it always better to use more neighbors?

Is it better to weigh the leaves, if yes, which distnace metric performs better?

GridCV returns fitting and scoring time for every combination. You will find that scoring time is higher than

training time. Why do you think is that the case?

Answer:

4. Which parameters seem to affect the performance most? To better answer this question, make plots of the

mean test score for each hyperparameter.

Hint: Suppose you have two types of hyperparameters: A and B. Let A = [1, 2] and B = [1, 2, 4, 7, 10].

Then, you have 20 different combination of hyperparameters.

Let A = 1. Then, you can try (A,B) = (1,1), (1,2), (1,4), (1,7), (1,10) Suppose that the mean score you got for the

above combination is [0.7, 0.72, 0.75, 0.77, 0.8]. Similarly, for A = 2, you tried (A,B) = (2,1), (2,2), (2,4), (2,7),

(2,10) and obtaind the mean score of [0.8, 0.82, 0.85, 0.87, 0.9].

To better see how changing the value of paramter A affects the performance, you can make the following plot:

In [75]:

This is the plot of the mean test score for A marginalizing over B.

Similarly, make a plot of the mean test score for each kNN hyperparameter.

In [ ]:

5. You have determined the best combination of hyperparameters and CV schemes. Predict the test y data

using the GridSearchCV method. Use the "rms" metric function we defined earlier and calculate the rms error

on the test data.

Hint: To determine the rms error, you need:

Truth: given from data (test_data[1])

Prediction: gridsearch.predict(test_data[0]) (https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

(https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html))

A_1 = [0.7, 0.72, 0.75, 0.77, 0.8]

A_2 = [0.8, 0.82, 0.85, 0.87, 0.9]

plt.plot(A_1, label = "A=1")

plt.plot(A_2, label = "A=2")

plt.ylabel("mean test score")

plt.legend()

plt.show()

...

In [ ]:

Classification

In [ ]:

Here we will look at 4 different type of cross-validation schemes -

Kfold

Stratified Kfold

Shuffle Split

Stratified Shuffle Split

6. Assuming the list of hyperparameters from Part 1, do 4 different grid searches. From Part 1, take top 5

combination of hyperparameters which gives you the highest accuracy value. Rank the performance of CV

schemes for each combination.

In [ ]:

...

from sklearn.neighbors import KNeighborsClassifier

# http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

from sklearn.model_selection import KFold, StratifiedKFold, ShuffleSplit, StratifiedShuffleSplit

In [ ]:

7. Answer the following questions:

Are the conclusions different for any parameter from the regression case?

Does the mean accuracy change for different CV scheme?

Does the standard deviation in mean accuracy change?

In [ ]:

Answer:

8. Using the best combination of hyperparameters and CV schemes you have found, compute the confusion

matrix (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html (https://scikitlearn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html))

and evaluate the accuracy.

Hint: To get a confusion matrix, you need both truth (available from data) and prediction (can be computed

using .predict function from GridSearchCV (https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

(https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)).

parameters = {'n_neighbors':[2, 3, 5, 10, 15, 20, 25, 50, 100], 'weights':['uniform'

knnc = KNeighborsClassifier()

#Grid Search

gc = GridSearchCV(knnc, parameters, cv=KFold(4, random_state=100))

#Do the fit

...

gc2 = GridSearchCV(knnc, parameters, cv=StratifiedKFold(4, random_state = 100))

#Do the fit

...

gc3 = GridSearchCV(knnc, parameters, cv=ShuffleSplit(4, 0.1, random_state = 100))

#Do the fit

...

gc4 = GridSearchCV(knnc, parameters, cv=StratifiedShuffleSplit(4, 0.1, random_state

#Do the fit

...

...

In [ ]:

Method 2. Random Forests

The most important feature of the random forest is the number of trees in the ensemble. We will also play with

the maximum depth of the trees.

Try:

n_estimators = [10, 50, 150, 200, 300]

max_depth = [10, 50, 100]

In [ ]:

1. Do the grid search over n_estimators and max_depth. List the combination of hyperparameters you tried and

evaluate the accuracy (mean test score) and its standard deviation. Which gives the highest accuracy value?

In [ ]:

2. Which parameters seem to affect the performance most? To better answer this question, make plots of the

mean test score for each hyperparameter. (plot the mean test score of n_estimators marginalizing over

max_depth, etc)

In [ ]:

...

from sklearn.ensemble import RandomForestRegressor

# http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

rf = RandomForestRegressor()

parameters = ...

gcrf = GridSearchCV(rf, parameters, cv=5)

...

...

3. Based on the results you obtained in Part 1, answer the following questions:

Are the scores of these models statistically different? Based on this, which architecture will you choose for

your model?

For every parameter, make the plot for fitting time. Based on this and the previous question, how many

trees do you recommend keeping in the ensemble?

In [ ]:

Answer:

4. You have determined the best combination of hyperparameters. Predict the test y data using the

GridSearchCV method. Use the "rms" metric function we defined earlier and calculate the rms error on the test

data.

In [ ]:

Classification

In [ ]:

In [ ]:

...

...

from sklearn.ensemble import RandomForestClassifier

# http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

#Grid search (This will take few minutes)

rfc = RandomForestClassifier()

parameters = ...

gcrfc = GridSearchCV(rfc, parameters, cv=StratifiedShuffleSplit(4, 0.1, random_state

...

5. Assuming the list of hyperparameters from Part 1, do the grid search using StratifiedShuffleSplit CV scheme.

List the combination of hyperparameters you tried and evaluate the accuracy (mean test score) and its standard

deviation. Which gives the highest accuracy value?

In [ ]:

6. Using the best combination of hyperparameters, compute the confusion matrix (https://scikitlearn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

(https://scikitlearn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html))

and evaluate the accuracy.

In [ ]:

To Submit

Execute the following cell to submit. If you make changes, execute the cell again to resubmit the final copy of

the notebook, they do not get updated automatically.

We recommend that all the above cells should be executed (their output visible) in the notebook at the

time of submission.

Only the final submission before the deadline will be graded.

In [ ]:

...

...

_ = ok.submit()


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp