Department of Informatics, King’s College London Pattern
Recognition (6CCS3PRE/7CCSMPNN).
Assignment: Support Vector Machines (SVMs) and Ensemble
Methods
This coursework is assessed. A type-written report needs to be submitted online
through KEATS by the deadline specified on the module’s KEATS webpage. In this
coursework, we consider (before Q8) a classification problem of 3 classes. A multi-class
SVM-based classifier formed by multiple SVMs is designed to deal with the classification
problem. And after Q8 (included Q8) considers your “own created” dataset to
investigate the classification performance using the techniques of Bagging and Boosting.
Some simple “weak” classifiers will be designed and combined to achieve an improved
classification performance for a two-class classification problem.
Q1. Write down your 7-digit student ID denoted as s1s2s3s4s5s6s7. (5 Marks)
Q2. Find R1 which is the remainder of . Table 1 shows
the multi-class methods to be used corresponding to the value of R1 obtained.
(5 Marks)
R1 Method
0 One against one
1 One against all
2 Binary decision tree
3 Binary coded
Table 1: R1 and its corresponding multi-class method.
Q3. Create a linearly separable two-dimensional dataset of your own, which consists of
3 classes. List the dataset in the format as shown in Table 2. Each class should contain at
least 10 samples and all three classes have the same number of samples. Note: This is
your own created dataset. The chance of having the same dataset in other submissions is
slim. Do not share your dataset with others to avoid any plagiarism/collusion issues.
(10 Marks)
Table 2: Samples of three classes.
Q4. Plot the dataset in Q3 to show that the samples are linearly separable. Explain why
your dataset is linearly separable. Hint: the Matlab built-in function plot can be used and
show some example hyperplanes which can linearly separable the datasets. Identify
which hyperplane is for which classes. (20 Marks)
Q5. According to the method obtained in Q2, draw a block diagram at SVM level to
show the structure of the multi-class classifier constructed by linear SVMs. Explain the
design (e.g., number of inputs, number of outputs, number f SVMs used, class label
assignment, etc.) and describe how this multi-class classifier works.
Remark: A blocking diagram is a diagram which is used to, say, show a concept or a
structure, etc. Here in this question, a diagram is used to show the structure of the multiclass
SVM classifier, i.e., how to put binary SVM classifiers together to work as a multiclass
SVM classifier. For example, Q5 of tutorial 9 is an example of a block diagram at
SVM level. Neural network diagram is a kind of diagram to show its structure at neuron
level. The block diagrams in lecture 9 are to show the architecture of ensemble
classifier, etc. (20 Marks)
Q6. According to your dataset in Q3 and the design of your multi-class classifier in Q5,
identify the support vectors of the linear SVMs by “inspection” and design their
hyperplanes by hand. Show the calculations and explain the details of your design.
(20 Marks)
Q7. Produce a test dataset by averaging the samples for each row in Table 2, i.e.,
(sample of class 1 + sample of class 2 + sample of class 3)/3. Summarise the results in
the form of Table 3, where N is the number of SVMs in your design and “Classification” is
the class determined by your multi-class classifier. Explain how to get the
“Classification” column using one test sample. Show the calculations for one or two
samples to demonstrate how to get the contents in the table. (20 Marks)
Table 3: Summary of classification accuracy.
Marking: The learning outcomes of this assignment are that student understands the
fundamental principle and theory of support vector machine (SVM) classifier; is able to
design multi-class SVM classifier for linearly separable dataset and knows how to
determine the classification of test samples with the designed classifier. The assessment
will look into the knowledge and understanding on the topic. When answering the
questions, show/explain/describe clearly the steps/design/concepts with reference to
the equations/theory/algorithms (stated in the lecture slides). When making comments
(if necessary), provide statements with the support from the results obtained.
Purposes of Assignment: This assignment provides the overall classification idea from
samples to design to classification. It helps you to make clear the concept, working
principle, theory, classification of samples, design procedure and multiple-class
classification techniques for SVM.
Q8. Create a non-linearly separable dataset consisting of at least 20 two-dimensional
dataset. Each data is characterised by two points x1 ∈ [?10, 10] and x2 ∈ [?10, 10] and
associated with a class y ∈ {?1, +1}. List the data in a table in a format as shown in Table
1 where the first column is for the data points of class “?1” and the second column is for
the data points of class “+1”. (20 Marks)
Q9. Plot the dataset (x axis is x1 and y axis is x2) and show that the dataset is nonlinearly
separable. Represent class “?1” and class “+1” using “×” and ‘?”, respectively.
Explain why your dataset is non-linearly separable. Hint: the Matlab built-in function
plot can be used. (20 Marks)
Q10. Design Bagging classifiers consisting of 3, 4 and 5 weak classifiers using the steps
shown in Appendix 1. A linear classifier should be used as the weak classifier. Ex- plain
and show the design of the hyperplanes of weak classifiers. List the parameters of the
design hyperplanes.
After designing the weak classifiers, apply the designed weak classifiers and bagging
classifier to all the samples in Table 1. Present the classification results in a table as
shown in Table 2. The columns “Weak classifier 1” to ‘Weak classifier n” list the output
class ({?1, +1}) of the corresponding weak classifiers. The column “Overall classifier” list
the output class ({?1, +1}) of the bagging classifier. The last row lists the classification
accuracy in percentage for all classifiers, i.e., .
Explain how to determine the class (for each weak classifier and over all classifier) using
one test sample. You will have 3 tables (for 3, 4 and 5 weak classifiers) for this question.
Comment on the results (in terms of classification performance when different number
of weak classifiers are used). (30 Marks)
Table 2: Classification results using Bagging technique combining n weak classifiers. The
first row “Data” are the samples (both classes 1 and 2) in Table 1.
Q11. Design a Boosting classifier consisting of 3 weak classifiers using the steps shown
in Appendix 2. A linear classifier should be used as a weak classifier. Explain and show
the design of the hyperplanes of weak classifiers. List the parameters of the design
hyperplanes. After designing the weak classifiers, apply the designed weak classifiers
and boosting classifier to all the samples in Table 1. Present the classification results in a
table as shown in Table 2. Explain how to determine the class (for each weak classifier
and boosting classifier) using one test sample. Comment on the results of the overall
classifier in terms of classification performance when comparing with the 1st, 2nd and
the 3rd weak classifiers, and with the bagging classifier with 3-weak classifiers in Q.3.
(30 Marks)
Appendix 1: Bagging1
Q1. Start with dataset D.
Q2. Generate M dataset D1, D2, . . ., DM .
Each distribution is created by drawing n
′ < n samples from D with.
replacement.
? Some samples can appear more than once while others do not appear at. all.
Q3. Learn weak classifier for each dataset.
weak classifiers fi(x) for dataset Di
, i = 1, 2, ..., M.
Q4. Combine all weak. classifiers using a majority voting scheme.
Appendix 2: Boosting 2
Dataset D with n patterns
Training procedure:
(1Details can be found in Section “Bagging” in the Lecture notes
2Details can be found in Section “Boosting” in the Lecture notes. )
Step 1: Randomly select a set of n1 ≤ n patterns (without replacement) from D to
create dataset D1. Train a weak classifier C1 using D1 (C1 should have at least 50%
classification accuracy).
Step 2: Create an “informative” dataset D2 (n2 ≤ n) from D of which roughly. half of
the patterns should be correctly classified by C1 and the rest is wrongly classified.
Train a weak classifier C2 using D2.
Step 3: Create an “informative” dataset D3 from D of which the patterns are not well
classified by C1 and C2 (C1 and C2 disagree). Train a weak classifier C3 using D3.
The final decision of classification is based on the votes of the weak classifiers.– e.g.,
by the first two weak classifiers if they agree, and by the third weak classifier if the
first two disagree.
Marking: The learning outcomes of this assignment are that student understands the
fundamental principle and concepts of ensemble methods (Bagging and Boosting); is
able to design weak classifies; knows the way to form Bagging/Boosting classifier and
knows how to determine the classification of test samples with the designed
Bagging/Boosting classifiers. The assessment will look into the knowledge and
understanding on the topic. When answering the questions, show/explain/describe
clearly the steps/design/concepts with reference to the equations/theory/algorithms
(stated in the lecture slides). When making comments, provide statements with the
support from the results obtained.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。