STAT GR5241 Spring 2023 Final
May 5, 2023
Students are allowed two pages of handwritten notes (front and back). Blank pages at the end can be used for scrap work. Calculators are allowed and will be useful. Tablets, phones, computers and other equivalent forms of technology are strictly prohibited. Students are not allowed to communicate with their neighbors during the exam. Please have your student ID available. Cheating will result in a score of zero. Good luck!
Section |
Problem |
Points |
Score |
Section I: Decision Trees and Boosting |
|
|
|
AdaBoost |
P1.i P1.ii |
10 5 |
|
Binary Decision Tree |
P2.i P2.ii P2.iii P2.iv P2.v |
3 5 6 3 3 |
|
Section II: Neural Networks |
|
|
|
Fully-connected Neural Networks |
P3.i P3.ii P3.iii |
5 5 10 |
|
Section III: Support Vector Machine |
|
|
|
Support Vector Machine |
P4.i P4.ii |
4 3 |
|
Section IV: Clustering |
|
|
|
Hierarchical Clustering |
P5.i P5.ii P5.iii P5.iv |
2 2 2 2 |
|
Section V: Theory |
|
|
|
Penalized Logistic Regression |
P6 |
10 |
|
EM Algorithm and Clustering |
P7 |
10 |
|
Section VI: Quiz |
|
|
|
TRUE/FALSE |
P8 |
10 |
|
Total |
|
100 |
|
Problem 1: AdaBoost [15 points]
In this problem we perform. the adaboost classification algorithm described in the lecture notes and homework 5. The dataset of interest consists of n = 100 training cases, p = 2 continuous features x1 , x2 , and dichotomous response Y labeled Y = −1 and Y = 1. The training data is displayed in Figure 1, which shows x2 versus x1 split by Y. Figure 1 also displays the location of a single test case xtest and training case x17 .
Figure 1: Problem 1 training data
The weak learner is a decision stump, i.e., for the bth boosting iteration:
The decision stump is trained by minimizing weighted 0-1 loss:
Our adaboost classifier is constructed from B = 8 weak learners. For boosting iterations b = 1, 2, . . . , 8, the trained decision stumps c1 , c2 ,..., c8 are:
The weighted errors ϵb are:
Solve problems (1.i)-(1.ii):
1.i [10 points] Classify the test case xtest = (0.600 0.395, based on the trained adaboost model. Use all B = 8 weak learners to estimate Y(ˆ)test. Show the details of your calculation for full credit.
1.ii [5 points] Does the structure of the training data (Figure 1) provide an advantage or disadvantage when using adaboost with decision stumps? Describe your answer in a few sentences.
Problem 2: Binary Decision Tree [20 points]
Consider the decision tree shown below in Figure 2 for a binary classification problem. Assume the classes are denoted as + and -, respectively. Suppose there are four binary attributes in the data, A, B , C, and D. The counts shown in the leaf nodes of the tree correspond to the number of training examples assigned to the nodes. Assume that the decision tree classifier assigns the majority class of training examples as the class label of each leaf node.
Figure 2: Unpruned decision tree for Problem 2
Solve problems (2.i)-(2.v):
2.i [3 points] Calculate the training error rate of the decision tree shown in Figure 2.
2.ii [5 points] Calculate the generalization error rate of the decision tree using the validation set given below in Figure 3. Note that the wildcard * shown in the table means the value could be either 0 or 1.
Figure 3: Validation set for Problem 2
2.iii [6 points] Calculate the training and generalization error estimate of the pruned decision tree shown in Figure 4.
Figure 4: Pruned decision tree for Problem 2
2.iv [3 points] Based on your estimate of generalization error, which tree should be preferred (the unpruned tree given in Figure 2 or the pruned tree given in Figure 4). Justify your choice.
2.v [3 points] Draw the confusion matrix for the tree on the training data. A confusion matrix is a table that summarizes the number of examples correctly or incorrectly predicted by the model. For example:
Figure 5: An example confusion matrix for Problem 2
In the above table, n+− is the number of positive examples incorrectly predicted as negative class
Problem 3: Fully-connected Neural Networks [20 points]
In this problem we fit a neural network to perform classification on the same n = 100 training data from Problem 1. The dichotomous response Y is labeled Y = 0 and Y = 1, instead of −1 and 1. Our neural network consists of a single hidden layer with d = 2 derived features, sigmoid activation function and softmax output function. The schematic of this model is displayed below:
The objective or total cost Q(θ) is cross-entropy:
The neural network is trained by minimizing Q(θ) with respect to θ, where parameter θ is the collection of weights and biases W[1] , b[1] , W [2] and b[2] . The quantity Qi (θ) represents the ith training case’scontribu- tion to total cross-entropy Q(θ). The neural network is trained via gradient descent, yielding the estimated weights:
Solve problems (3.i)-(3.iii):
The following problems can be simplified using matrix multiplication. Show the details of your calcu- lation for full credit.
3.i [5 points] Classify the test case xtest = (0.600 0.395, based on the trained neural network.
3.ii [5 points] The 17th case of the training data is x17 = (0.4474 0.8764, with label y17 = 0. Compute the cost of case 17, i.e., compute Q17 (θ(ˆ)).
3.iii [10 points] Note that the gradient of Q(θ) can be expressed
Use the backpropagation algorithm to compute ΔQ17 (θ(ˆ)). This quantity represents the 17th case’s
contribution to the full gradient of Q(θ), evaluated at the point θ = θ(ˆ) . Show the details of your calculation for full credit.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。