联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-09-26 11:06

CMSC 421 Assignment One

Neural Networks and Optimization

September 12, 2023

General Instructions. Please submit TWO (2) files to ELMS:

(1) a PDF file that is the report of your experimental results and answers to the questions.

(2) a codebase submission in form of a zip file including only the code folders/files you modified and

the Questions folder. Please do not submit the Data Folder we provided. The code should contain

your implementations of the experiments and code for producing visualizations of the results.

The project is due at 11:59 pm on September 26 (Monday), 2023.

Please read through this document before starting your implementation and experiments. Your score

will be mostly dependent on the completion of experiments, the effectiveness of the reported results,

visualizations, the consistency between the experimental results and analysis, and the clarity of the

report. Neatness and clarity count! Good visualization helps!

As you would need to use pytorch for the second half of the programming assignment Convolutional

Neural Networks - 15 Points, We have included links to some tutorials and documentations to help

you get started with pytorch:

• Official Pytorch Documentation

• Quickstart Guide

• Tensors

• Data Loading

• Building models in Pytorch

Implementation Details

For each problem, you’ll need to code both the training and application phases of the neural network.

During training, you’ll adjust the network’s weights and biases using gradient descent. Use a single

parameter, η, to control the step size during gradient descent. The updated weights and biases will be

calculated as the old values minus the gradient multiplied by the step size.

We will be providing code snippets and datasets for some parts of the assignment. You will be required

to read the comments in the code file and fill in the missing pieces in the code files to correctly execute

these files. Please ensure that you are read through all the code files we provide. These will be available

in the CMSC421 - Fall2023 GitHub repository.

1

Part 1: Programming Task - (50 Points)

Objective

The goal of this assignment is to build a neural network from scratch, focusing on implementing the

backpropagation algorithm. You’ll apply your neural network to simple, synthetic datasets to gain

hands-on experience in tuning network parameters.

Language and Libraries

Python is mandatory for this assignment. Use numpy for all linear algebra operations. Do not use

machine learning libraries like PyTorch or TensorFlow for Questions 1,2 & 3; only numpy, matplotlib,

and Python built-in libraries are permitted.

1 Simple Linear Regression Model - (10 Points)

1.1 Network Architecture

• The network consists of an input layer, a hidden layer with one unit, a bias layer, and an output

layer with one unit.

• The output is a linear combination of the input, represented as a1 = Xw0 + a0 + b1.

1.2 Loss Function

Use a regression loss for training, defined as

1

2

Xn

i=1

(yi − a1(xi))2

1.3 Implementation

Using the template_for_solitions file, write code to train this network and apply it to data on both

1D data as q1_a and on higher dimensional data as q1_b.

• Data Preparation: Use the q1_<a/b> function from the Data.generator module to generate

training and testing data. The data module has both a and b so use the appropriate function

call to fetch the right data for each experiment.

• Network Setup: Use the net_setup method in the Trainer class to initialize the network, loss

layer, and optimizer.

• Training: Use the train method in the Trainer class to train the network. Plot the training

loss over iterations.

• Testing: Use the test data to evaluate the model’s performance. Plot the actual vs. predicted

values and compute evaluation metrics.

Tests and Experiments

1.4 Hyperparameters

• The main hyperparameters are the step size (η) and the number of gradient descent iterations.

• You may also have implicit hyperparameters like weight and bias initialization.

Hyperparameter Tuning

Discuss the difficulty level in finding an appropriate set of hyperparameters.

2

2 A Shallow Network - (10 Points)

The goal of this assignment is to implement a fully connected neural network with a single hidden

layer and a ReLU (Rectified Linear Unit) activation function. The network should be flexible enough

to accommodate any number of units in the hidden layer and any size of input, while having just one

output unit.

2.1 Network Architecture

The network consists of an input layer, a hidden layer with one unit, a bias layer, and an output layer

with one unit.

• Input Layer: a01

, a02

, . . . , a0d

• Hidden Layer: z1j =

Pd

k=1 Xw1k a0k + b1j

• ReLU Activation: a1j = max(0, z1j

)

• Output Layer: a2 =

Pd

k=1 Xw2k a1k + b2

2.2 Loss Function

Continue to use a regression loss for training the network. You can continue to use a regression loss

in training the network defined as

Xn

i=1

1

2

(yi − a

1

1

(xi))2

2.3 Implementation

Using the template_for_solitions file, write code to train this network and apply it to data on both

1D data as q2_a.py and on higher dimensional data as q2_b.py.

• Data Preparation: Use the q2_<a/b> function from the Data.generator module to generate

training and testing data. The data module has both a and b so use the appropriate function

call to fetch the right data for each experiment.

• Network Setup: Use the net_setup method in the Trainer class to initialize the network, loss

layer, and optimizer.

• Training: Use the train method in the Trainer class to train the network. Plot the training

loss over iterations.

• Testing: Use the test data to evaluate the model’s performance. Plot the actual vs. predicted

values and compute evaluation metrics.

Tests and Experiments

2.4 Hyperparameters

You now have an additional hyperparameter: the number of hidden units.

Hyperparameter Tuning:

• Discuss the difficulty in finding an appropriate set of hyperparameters.

• Compare the difficulty level between solving the 1D problem and the higher-dimensional problem.

3

3 General Deep Learning - (15 Points)

The goal of this section of the assignment is to write your neural network to handle fully-connected

networks of arbitrary depth. It will be just like the network in Problem 2, but with more layers. Each

layer will use a ReLU activation function, except for the final layer.

Tests and Experiments

• Test your network with the same training data that you used in Problem 2 A Shallow Network -

(10 Points), using both 1D and higher dimensional data. Experiment with using 3 and 5 hidden

layers. Evaluate the accuracy of your solutions in the same way as Problem 2 A Shallow Network

- (10 Points).

• Conduct and report on experiments to determine whether the depth of a network has any significant effect on how quickly your network can converge to a good solution. Include at least one

plot to justify your conclusions.

Again ensure your files are saved as q3_a.py and q3_b.py.

EXTRA CREDIT (EC): - Cross Entropy Loss (10 Points) Modify your network General

Deep Learning - (15 Points) in to perform classification tasks using a cross-entropy loss and a logistic

activation function in the output layer.

If you are submitting the EC save the code files as qec_a.py and qec_b.py.

3.1 Network Architecture

• Input Layer: Arbitrary size

• Hidden Layers: ReLU activation, arbitrary depth

• Output Layer: Logistic activation function defined as a

L

1 =

1

1+e

−zL

1

3.2 Loss Function

Use a cross-entropy loss defined as:

Xn

i=1


yi

log(a

L

1

(xi)) + (1 − yi)log(1 − a

L

1

(xi))

Here, yi

is assumed to be a binary value (0 or 1).

3.3 Note on Numerical Stability

Be cautious when exponentiating numbers in the sigmoid function to avoid overflow. Utilize np.maximum

and np.minimum for a concise implementation.

Tests and Experiments

3.4 Test Scenarios

1. 1D Data Tests:

• Linearly Separable Data:

– Vary the margin between points and the number of layers.

– Investigate the difficulty in finding hyperparameters based on the margin.

– Examine the speed of convergence based on the margin. Include plots.

• Non-Linearly Separable Data:

– Note the differences you observe when the data is not linearly separable.

4

2. Higher-Dimensional Data Tests:

• Repeat the experiments with higher-dimensional data.

• Use both linearly separable and non-linearly separable data sets.

• Include data to support your conclusions.

5

4 Convolutional Neural Networks - 15 Points

In this Section, you are required to implement a Convolutional Neural Network (CNN) using PyTorch

to classify images from the CINIC-10 dataset provided.

Requirements

Your CNN model should meet the following criteria:

(A) Utilize dropout for regularization. Mathematically, dropout sets a fraction p of the input units

to 0 at each update during training time, which helps to prevent overfitting.

(B) Be trained using either the RMSprop and ADAM optimizer separately. The update rule for

RMSprop is given by:

θt+1 = θt −

η

vt + 

· gt

where θ are the parameters, η is the learning rate, vt is the moving average of the squared

gradient,  is a smoothing term to avoid division by zero, and gt is the gradient.

For ADAM, the update rule is:

θt+1 = θt −

η · mˆ t √

vˆt + 

where mˆ t and vˆt are bias-corrected estimates of the first and second moment of the gradients.

Report on how each optimizer performed.

(C) Include at least 3 convolutional layers and 2 fully connected layers. The convolution operation

can be represented as:

(f ∗ g)(t) = X

τ

f(τ ) · g(t − τ )

(D) Use wandb for visualization of the training loss L, which could be the cross-entropy loss for

classification:

L = −

X

i

yi

log(ˆyi)

Experimental Results

In addition to reporting the Test Accuracy and plotting the figure of Training Loss over iterations, the

following experimental results should also be reported for a comprehensive evaluation of the model’s

performance:

1. Validation Accuracy and Loss: Monitor and report the accuracy and loss on a separate

validation set to assess the model’s generalization capability.

2. Confusion Matrix: Include a confusion matrix to identify which classes the model is having

difficulty distinguishing between.

3. Precision, Recall, and F1-Score: Calculate and report these metrics to provide a more

nuanced view of the model’s performance. The F1-Score is the harmonic mean of Precision and

Recall and is defined as:

F1 = 2 ×

Precision × Recall

Precision + Recall

4. Model Size: Report the number of parameters and the memory footprint of the model.

5. Hyperparameter Tuning: If hyperparameter tuning is performed, report the performance

under different hyperparameter settings, such as learning rate, batch size, etc.

6. Class-wise Accuracy: Report the accuracy for each individual class to show how well the

model performs on different categories.

6

Part 2: Theoretical Questions - (50 Points + 3 Bonus Points)

1. Please answer the following questions about the activation function: - (9 Points)

(A) Why do we need activation functions in neural networks? (1 points)

(B) Write down the formula of the Sigmoid function and its derivative. What are the pros and cons

of using the Sigmoid function in neural networks? (4 points)

(C) Write down the formula of the ReLU function and its derivative. What are the pros and cons of

using the ReLU function in neural networks? (4 points)

2. When we optimize the neural networks, we usually use gradient descent to update

the weights of neural networks. To obtain well-trained neural networks, one of the most

important hyperparameters is the learning rate. Please answer the following questions

about learning rate: - (6 Points)

(A) What is the role of the learning rate in the gradient descent algorithm? (2 points)

(B) What happens to the neural network if the Learning Rate is too low or too high? (4 points)

3. After we train a neural network, we need to evaluate the model performance by determining if the model is underfitting or overfitting. Please answer the following questions

about underfitting or overfitting: - (12 Points)

(A) Explain the concept of underfitting and overfitting in your own words. And explain how to

determine whether a model is overfitting or underfitting based on the model performance on the

training set and validation set. (4 points)

(B) Please write down four methods that can be used to prevent the overfitting of a neural network.

(4 points)

(C) Please write down four methods that can be used to prevent the underfitting of a neural network.

(4 points)

4. Computer Vision(CV) and Natural Language Processing(NLP) are two primary application areas of neural networks. In CV areas, CNN models are often used to extract

information from images and videos, while RNN and Transformer are often used in NLP

areas to handle text data. - (9 Points + 3 Bonus Points)

(A) The key components of a CNN architecture include convolutional layers, pooling layers, and fully

connected layers. Provide a brief description of the function of each component. (4.5 points)

(B) Explain the concept of Hidden State, Time Steps and Weight Sharing in the design of RNN. (4.5

points)

(C) Bonus Question: Batch Normalization (BN) is important in real-world practice. Please describe

what BN is doing and explain why do we need BN in neural networks. (3 points)

5. Convolutional to Multi-layer Perceptron - (14 Points)

A convolution operation is a linear operation, and therefore convolutional layers can be represented

in the form of matrix multiplication, or in other words, represented by multi-layer perceptron. More

precisely, if we denote the convolution operation as c(x, θw, θb, γ), where θw are the filter weights, θb

are the filter biases, and γ are the padding and stride parameters, we want to convert the filters to a

weight matrix so that

flatten(c(x, θw, θb, γ)) = Wflatten(x) + b, (1)

where flatten(·) takes in a tensor of size (d1, d2, d3) and outputs a 1-D vector of size (d1×d2×d3). For example, flatten(F ilter1) = (i1,1, i1,2, i1,3, i2,1, i2,2, i2,3, i3,1, i3,2, i3,3, j1,1, j1,2, j1,3, j2,1, j2,2, j2,3, j3,1, j3,2, j3,3)

The converted weights and biases W and b depend on the convolution filters θw, θb and also γ (paddings

and strides).

Suppose the input is a 2 × 2 × 3 (C × H × W) image, and we have a convolutional layer with

two filters as shown in Figure 1, where the filter size is 3 × 3, the padding is 1 (filled with zeros)

7

1st Channel

A Sliding Window

2nd Channel

j1,1 j1,2 j1,3

j2,1 14 15

j3,1 17 18

i1,1 i1,2 i1,3

i2,1 i2,2 i2,3

i3,1 i3,2 i3,3

l1,1 l1,2 l1,3

l2,1 32 33

l3,1 35 36

k1,1 k1,2 k1,3

k2,1 k2,2 K2,3

k3,1 k3,2 k3,3

Filter 1

Filter 2

Figure 1: Input image and filters. Note that the sliding window slides in row major order, i.e., it first

slides right and changes to the first position of the second row until it reaches the end of the first row.

The white region around the input image is the zero padding.

and the stride is 1. The bias terms for the two convolutional filters in Filter1(Filter2) are b1(b3)

and b2(b4) respectively. For one filter, we convolve it with every sliding window of the input

image, and every such convolve operation over one sliding window generates one output of this

convolutional layer. For one filter, there are 6 sliding windows in total, which correspond to the

6 outputs of such filter. For every sliding window, we can think the output to be generated by

a dot product of a weight vector and the flattened input image, where non-zero entries of the

the weight vector should have exactly the same values as the filter, and their positions depend

on the sliding window. When we get the weight vector for each sliding window, we can simply

stack them together to get the converted weight matrix W. The bias part is simple, as for one

filter, we are adding the same bias to every sliding window output. Write out the weight matrix

W and bias b in terms of the filter weights and biases. Convince yourself that you get exactly

the same output (flattened) as the original convolution.

8


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp