联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-04-27 09:37

COMPSCI 589 Homework 4 - Spring 2023

Due May 6, 2023, 11:55pm Eastern Time

1 Instructions

This homework assignment consists of a programming portion. While you may discuss problems

with your peers, you must answer the questions on your own and implement all solutions

independently. In your submission, do explicitly list all students with whom you discussed this

assignment.

We strongly recommend that you use LATEX to prepare your submission. The assignment should

be submitted on Gradescope as a PDF with marked answers via the Gradescope interface. The

source code should be submitted via the Gradescope programming assignment as a .zip file.

Include with your source code instructions for how to run your code.

We strongly encourage you to use Python 3 for your homework code. You may use other

languages. In either case, you must provide us with clear instructions on how to run your code

and reproduce your experiments.

You may not use any machine learning-specific libraries in your code, e.g., TensorFlow, PyTorch,

or any machine learning algorithms implemented in scikit-learn. You may use libraries like numpy

and matplotlib. If you are not certain whether a specific library is allowed, do ask us.

All submissions will be checked for plagiarism using two independent plagiarism-detection tools.

Renaming variable or function names, moving code within a file, etc., are all strategies that do

not fool the plagiarism-detection tools we use. If you get caught, all penalties mentioned in the

syllabus will be applied—which may include directly failing the course with a letter grade of “F”.

The tex file for this homework (which you can use if you decide to write your solution in LATEX),

as well as the datasets, can be found here.

The automated system will not accept assignments after 11:55pm on May 6.

1

Programming Section (100 Points Total)

In this homework, you will be implementing the backpropagation algorithm to train a neural network.

Notice that you may not use existing machine learning code for this problem: you must

implement the learning algorithm entirely on your own and from scratch.

In this assignment, you will:

Implement the backpropagation algorithm to train a neural network. Your implementation

should support networks with an adjustable number of layers and neurons. For example, it should

support training a neural network with two hidden layers—the first one containing 5 neurons

and the second one containing 3 neurons; or a neural network with three hidden layers—each

one containing 4 neurons.

Implement the regularization mechanism discussed in class, which will be used, here, to try to

mitigate overfitting.

Evaluate your neural network using the stratified cross-validation strategy that you implemented

as part of a previous homework. We recommend using k = 10 folds. Out of these, k? 1 folds will

be used to train the neural network. The remaining fold—the one not used during training—will

be used to evaluate the neural network’s performance. You will then repeat this process, thereby

generating (and evaluating) k neural networks. The final performance is the average of the

performances computed for each of the folds.

Evaluate the performance of your neural network on two datasets as a function of three design

decisions: the number of layers of the network; the number of neurons in each of the layers; and

the regularization parameter.

Each student is free to decide whether to implement the “standard” (non-vectorized) version of

backpropagation, or the vectorized version of backpropagation—where all quantities relevant to training

(i.e., predicted outputs, the value of the cost function J , and gradients) are computed using matrix

multiplication operations. We recommend that students solve this assignment by using vectorization

techniques. This will significantly improve the time to train your neural network, and it will also result

in a relatively more straightforward implementation of the backpropagation algorithm.

We are providing students with files describing (for two simple neural networks) all the relevant

quantities computed by the backpropagation algorithm, step by step. These are intended to help

you debug your implementation of the backpropagation algorithm. These files describe, for

example, what is the activation of each neuron when the network is presented with a particular training

instance; what is the final output/prediction of the network given a particular training instance; what

is the δ value associated with each neuron, as well as the corresponding gradients of each of the

weights of the network. Students should use the information contained in these files to ensure that the

quantities computed by their implementation of backpropagation match all expected values produced

by a correct implementation of this learning algorithm. In case, for example, the gradients computed

by a student’s implementation do not match the expected/correct ones, the student will be able to

quickly identify the first point during the execution of their algorithm where one of the quantities

produced by their solution does not match the correct/expected one. This should facilitate debugging.

2

2 Deliverables and Experiments

The main objective of this homework is to study how the performance of a trained neural network is

affected by (i) the neural network architecture (i.e., by the number of layers and neurons); and (ii) by

the regularization parameter.

When implementing your neural network, do not forget to add a bias input to each neuron, and

to ensure that updates to bias weights are performed correctly if regularization is used. For more

information, please see the slides of the corresponding lecture. Also, do not forget to properly normalize

the attributes of training and test instances whenever appropriate/necessary.

You are free to choose between two simple stopping criteria. You may, for instance, (1) stop if

(after presenting all training instances to the network and updating its weights) the cost function J

improves by less than some small user-adjusted constant ?; or (2) stop after a constant, pre-defined

number k of iterations—where each iteration consists of presenting all instances of the training set

to the network, computing gradients, and updating the network’s weights. You are free to choose

which criterion you would like to use. In all questions below, do not forget to mention explicitly which

criterion you used and what was its corresponding hyper-parameter (i.e., ? or k). We encourage you to

try different possibilities to identify the setting that produces the best possible performance for your

algorithm.

After verifying and ensuring the correctness of your solution (by comparing the outputs produced

by your backpropagation implementation with the step-by-step examples we provided; please see

Section 2.1), you will be analyzing two datasets:

(1) The Wine Dataset The goal, here, is to predict the type of a wine based on its chemical

contents. The dataset is composed of 178 instances. Each instance is described by 13 numerical

attributes, and there are 3 classes.

(2) The 1984 United States Congressional Voting Dataset The goal, here, is to predict the

party (Democrat or Republican) of each U.S. House of Representatives Congressperson. The dataset is

composed of 435 instances. Each instance is described by 16 categorical attributes, and there are 2

classes.

2.1 Correctness Verification

You will first have to verify that your implementation of backpropagation is correct; that is, that

the gradients it computes are accurate. To do this, we are providing each student with two files:

backprop example1.txt and backprop example2.txt. Each of these files describes a particular neural

network architecture and one (minimal) training set. Each file shows, in detail, all intermediate

quantities computed by the backpropagation algorithm, which you will then compare with those

produced by your algorithm. These quantities include information such as the activation of each

neuron when the network is presented with a given input, the final predicted output produced by the

network (and the corresponding expected output), the error/cost function, J , of the network, the δ

values of each neuron, and the gradients of all weights.

You must ensure that your implementation of backpropagation is correct. To do so, you should

write a function capable of reproducing the quantities described in each of the two benchmark files.

This function should produce textual output (e.g., by using simple print statements) indicating all

relevant quantities computed by your implementation of backpropagation—similarly to how they are

3

presented in the provided benchmark files. In other words, as part of your solution to this assignment,

you must allow us to call this function (that is, the function that shows that your code can reproduce

the outputs presented in backprop example1.txt and backprop example2.txt), so that we can verify the

correctness of your code. The output itself that you generate does not have to be identical to that

shown in the benchmark files, but it should include at least, for each training instance and for both

datasets: the activation of each neuron; the final predicted output of the network; the expected output;

the cost, J , associated with that particular instance; the δ values of each neuron and the gradients

of all weights after the network is presented with that training instance; and the final (regularized)

gradients after the backpropagation algorithm is done processing all instances in the training set.

You should then (i) include in your report instructions describing how we (the

instructor, staff, and TA’s) should run the function you implemented to demonstrate

the correctness of your backpropagation algorithm; i.e., the function that produces all

results/quantities described above; and (ii) present in your report the textual output

produced by this function, for both provided benchmark files. This part of the assignment

is extremely important, and a non-trivial amount of your final grade will depend on

us being able to verify that you implemented the backpropagation training algorithm

correctly. We strongly recommend that you only start working on the next part of this assignment

after you have verified that your implementation of backpropagation is correct. Otherwise, it may be

hard (or impossible) for your neural network to properly learn solutions to the proposed problems.

The two benchmark files we provide (backprop example1.txt and backprop example2.txt) are struc-

tured as follows. The first line of each file specifies the regularization parameter, λ, that should be

used. After that, you will find a line describing the network structure—the number of neurons in

each layer. Recall that the first layer corresponds to the network’s input layer and that the last layer

corresponds to the output layer. In the notation we adopted here, the bias neuron is not included in

the neuron counts shown in this line. As an example, consider a neural network with two inputs, one

hidden layer with 3 neurons, and one output. In this case, the second line of the provided files would

define the network’s structure as being [2 3 1]. Notice, however, that (as always) your implementation

should include bias neurons/weights. We do not typically count them when defining/specifying the

architecture of a network because it is implicit that they are always being used. After this section of

the file, you will encounter a description of the initial/current weights of the network. Here, each row

represents the weights of a given neuron in one particular layer; the weights of the i-th neuron are

stored in the i-th row of the corresponding table/matrix. Recall that the first weight of each neuron

corresponds to the bias weight connected to that neuron. Consider the following simple neural network

as an example:

Conjunto de treinamento:

{x(1) = 0.13 y(1) = 0.90}

{x(2) = 0.42 y(2) = 0.23}

The weights θl=1 (Theta1) would be shown as follows:

0.4 0.1

0.3 0.2

4

The weights θl=2 (Theta2) would be shown as follows:

0.7 0.5 0.6

After this section of the file, you will find a description of the inputs (x) and expected outputs (y)

of each instance in a training set. The subsequent parts of the file show the intermediate quantities

computed by the forward propagation process when performed on each of the training instances. In

particular, this part of the file shows the z and a values of each neuron (i.e., their activations), as

well as the network’s final prediction, f(x), and the cost function, J , associated with that particular

instance. The final section of each presents all intermediate quantities computed by the algorithm

during the backpropagation phase: the δ values of each neuron, after processing a particular training

instance, and the gradients of all weights after processing that instance. After that, you will find the

final (regularized) gradients of all weights, computed based on all training instances. As discussed

in class, these gradients correspond to the gradients based on which we should update the network’s

weights so that it (on average) makes better predictions for all training examples in the dataset.

2.2 Experiments and Analyses

For each dataset, you should:

1. Train a neural network and evaluate it using the stratified cross-validation technique discussed in

class. You should train neural networks using different values for the regularization parameter,

λ, and using different architectures. You can decide which architectures you will evaluate. As an

example, consider testing networks with 1, 2, 3, and 4 hidden layers, and with various numbers

of neurons per layer: e.g., 2, 4, 8, 16.

2. For each trained neural network (i.e., for each combination of architecture and regularization

that you tested), you should measure the resulting model’s accuracy and F1 score.

3. Based on these experiments, you should create, for each dataset and for each of the metrics

described above, a table summarizing the corresponding results (i.e., you should show the value

of each performance metric for each type of neural network that you tested, on both datasets).

You should test at least 6 neural network architectures on each dataset.

4. Discuss (on a high level) what contributed the most to improving performance: changing the

regularization parameter; adding more layers; having deeper networks with many layers but few

neurons per layer? designing networks with few layers but many neurons per layer? Discuss

any patterns that you may have encountered. Also, discuss whether there is a point where

constructing and training more “sophisticated”/complex networks—i.e., larger networks—no

longer improves performance (or worsens performance).

5. Based on the analyses above, discuss which neural network architecture you would select if you

had to deploy such a classifier in real life. Explain your reasoning.

6. After identifying the best neural network architecture for each one of the datasets, train it once

again on the corresponding dataset and create a learning curve where the y axis shows the

network’s performance (J) on a test set, and where the x axis indicates the number of training

samples given to the network. This graph intuitively represents something like this: after showing

5 training examples to the network, what is its performance J on the test set? After showing 10

training examples to the network, what is its performance J on the test set? If the network’s

5

parameters and architecture are well-adjusted, this graph should indicate that the network’s

performance improves as the number of training examples grows; in particular, that J decreases

as a function of the number of training instances presented to the network. Please also report

the step size value, α, you used when training this network.

7. Although this last step/task is not required, we recommend that you train the network using the

mini-batch gradient descent approach. You can manually select and adjust the mini-batch size

that you would like to use. Training a neural network in this way significantly accelerates the

training process.

3 Some Hints

1. Initialize the network’s weights with small random numbers from -1 to +1 or random numbers

sampled from a Gaussian distribution with zero mean and variance equal to 1.

2. When trying to identify effective network architectures, start your investigation by testing

networks with few layers (e.g., try, first, networks with just one hidden layer). Increase the

number of layers—and/or the number of neurons per layer—only when necessary.

3. When training a given neural network architecture, try, first, to use larger step sizes, α. If you

set α too small, the training process may become very slow or prematurely converge to a bad

local minimum (depending on the stopping criterion you use). By contrast, if the network’s

performance oscillates significantly during training when using large values of α, you might be

overshooting. In this case, decrease the value of α. Repeat this process until you find the largest

step size that can be effectively used—i.e., the value that allows the weights to be updated fairly

quickly, but that does not cause weight instability or divergence.

4. After you identify a value of α that causes the network’s performance to improve in a reasonably

consistent way, check the performance (value of J) to which the network converges. If it is

not satisfactory, your network architecture might be too simple (i.e., the network might be

underfitting). In this case, try to increase the number of layers and/or the number of neurons

per layer. Then, repeat the analyses above regarding how to evaluate different step sizes, α.

There are four ways in which we may receive extra credit in this homework.

(Extra Points #1: 13 Points) Implement the vectorized form of backpropagation. As previously

mentioned, this is not mandatory, but you can get extra credit if you do it. Most modern implementa-

tions of neural networks use vectorization, so it is useful for you to familiarize yourself with this type

of approach.

(Extra Points #2: 13 Points) Analyze a third dataset: the Breast Cancer Dataset. The goal,

here, is to classify whether tissue removed via a biopsy indicates whether a person may or may not

have breast cancer. There are 699 instances in this dataset. Each instance is described by 9 numerical

attributes, and there are 2 classes. You should present the same analyses and graphs as discussed

above. This dataset can be found in the same zip file as the two main datasets.

(Extra Points #3: 13 Points) Analyze a fourth, more challenging dataset: the Contraceptive

Method Choice Dataset. The goal, here, is to predict the type of contraceptive method used by a

6

person based on many attributes describing that person. This dataset is more challenging because it

combines both numerical and categorical attributes. There are 1473 instances in this dataset. Each

instance is described by 9 attributes, and there are 3 classes. The dataset can be downloaded here.

You should present the same analyses and graphs discussed above.

(Extra Points #4: 13 Points) Implement the method (discussed in class) that allows you to

numerically check the gradients computed by backpropagation. This will allow you to further ensure

that all gradients computed by your implementation of backpropagation are correct; that way, you

will be confident that you trust your implementation of the neural network training procedure. You

should present in your report the estimated gradients for the two neural networks described on the

provided benchmark files. First, estimate the gradients using ? = 0.1, and then ? = 0.000001. If you

choose to work on this extra-credit task, please include the corresponding source code along with your

Gradescope submission.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp