联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-03-15 08:07

EE2211: Spring 2023

Assignment 2 (15%)

Deadline: 17th Mar 2023 at 23:59 (Friday of Week 9)


Tasks: Write a single python file for multi-class classification:


Outputs: Please submit a single python file with filename “A2_StudentMatriculationNumber.py”. It should

contain a function A2_MatricNumber that returns the following numpy.array or scalar:

§ X_train: the training data matrix with dimensions (number_of_training_samples ? 2). (1%)

§ Y_train: the training targets (containing values 0, 1 and 2) of length number_of_training_samples. (1%)

§ X_test: the test data matrix with dimensions (number_of_test_samples ? 2). (1%)

§ Y_test: the test targets (containing values 0, 1 and 2) of length number_of_test_samples. (1%)

§ Y_train_onehot: one-hot target matrix (containing only values 0 and 1) with dimension

(number_of_training_samples ? 3). (1%)

§ Y_test_onehot: one-hot target matrix (containing only values 0 and 1) with dimension

(number_of_test_samples ? 3). (1%)

§ best_order: the best polynomial order (an integer scalar). (1%)

§ best_P_train: the training polynomial features obtained with the best order. (1%)

§ best_P_test: the test polynomial features obtained with the best order. (1%)

§ best_wp: the estimated coefficients of classifier obtained with the best order. (2%)

§ best_train_acc: the training accuracy (accuracy = #correct / #total) obtained with the best order. (2%)

§ best_test_acc: the testing accuracy (accuracy = #correct / #total) obtained with the best order. (2%)



Step-by-step instructions:

0. Python Library Installation: pip / conda install scikit-learn matplotlib numpy

1. Data Splitting: In the template file “A2_StudentMatriculationNumber.py”, we use the function

“make_dataset” to generate a 2-D dataset (X, Y) with N samples for classification. X is a [N ? 2] matrix and

Y is a label vector of length N. Please split the dataset to into two sets: 70% for training set (X_train, Y_train)

and 30% for testing set (X_test, Y_test). Visualization code is already implemented in the template, which

automatically saves two figures “./train_set.png” and “./test_set.png” to your current folder.

1) Data Splitting:The dataset is available in the template file “A2_StudentMatriculationNumber.py”,

which contains 3 classes of training samples for classification. Please split the dataset into two sets: 70%

of samples for training, and 30% of samples for testing.

NOTE 1: Please use “from sklearn.model_selection import train_test_split” to split the data into the

training and test sets.

2) One-hot Encoding: Generate one-hot encoded labels for both the training set and the test set.

3) Polynomial Features: Use the training and test samples to generate polynomial features (use “from

sklearn.preprocessing import PolynomialFeatures”) with different orders from 1 to 30.

NOTE 2: The offset/bias term will be automatically generated by PolynomialFeatures.

4) Classification: Use ridge regression with λ=0.0001 for classification (based on one-hot encoding) and

compute the classification accuracies on the training and test sets with different orders. Please find the

best classifier (polynomial order) according to the classification accuracy.

NOTE 3: If the number of rows in the training polynomial matrix (also known as the design matrix) is

less than or equal to the number of columns, then use the dual form of ridge regression (Lecture 6). If

not, use the primal form (Lecture 6).


2. One-hot Encoding: In this step, we transform the real-valued labels in Y_train and Y_test into one-hot

encoded vectors. To be precise, label 0 should be transformed to [1, 0, 0], label 1 should be transformed to

[0, 1, 0], and label 2 should be transformed to [0, 0, 1]. The encoded labels Y_train_onehot and Y_test_onehot

should be matrices with dimensions (number_of_samples ? 3).

3. Polynomial Features: The dataset in this assignment is not linearly separable. In this case, the use of

polynomials can help map the 2-D data into a higher-dimensional space, where the training samples may be

separable. In this step, let us transform our 2-D data (X_train, X_test) to polynomial features (P_train, P_test)

using “sklearn.preprocessing.PolynomialFeatures” with different orders. For example:


4. Classification: Classification is performed on the polynomial features (P_train, P_test) and the one-hot

encoded labels (Y_train_onehot, Y_test_onehot). With polynomial orders from 1 to 30, you will eventually

get 30 classifiers with coefficients “wp”. Please choose the best classifier according to your training or

testing accuracy and return the corresponding “best_order”, “best_P_train”, “best_P_test”, “best_wp”,

“best_train_acc”, “best_test_acc” as mentioned in the “Output” section. In this part, code for visualization is

provided to illustrate the non-linear decision boundaries of classifiers. All figures that are automatically

generated will be saved as “decision_boundary_order=#ORDER.png”.


Submission:

Please use the python template provided to you. Do not comment out any lines. Remember to rename both

“A2_StudentMatriculationNumber.py” and “A2_MatricNumber” using your student matriculation number. For

example, if your matriculation ID is A1234567R, then you should submit “A2_A1234567R.py” that contains

the function “A2_A1234567R”. Please do NOT zip/compress your file. Please test your code at least once.

Points will be deducted if instructions are not followed. The way we would run your code will be as follows:

>> import A2_A1234567R as grading

>> X_train, Y_train, X_test, Y_test, Y_train_onehot, Y_test_onehot,

best_order, best_P_train, best_P_test, best_wp, best_train_acc,

best_test_acc = grading.A2_A1234567R()


NOTE: The PNG files used for visualizing your results will not affect the final grading and are only intended for

you to verify your program. Visualization requires the following variables to be defined in your program: X_train,

Y_train, X_test, Y_test, Y_train_onehot, Y_test_onehot, order, P_train, P_test, wp, train_acc, and test_acc. If your

program does not generate any images, please check your variable names to ensure they are correct.


Gentle Reminders:

1. Please make sure you replace “StudentMatriculationNumber” and “MatricNumber” with your matriculation

number!

2. Date of release: 20th Feb 2023 (Monday of Recess Week)

3. Submission: Canvas/EE2211/Assignments/Assignment 2

4. The submission folder in Canvas will be closed on 17th Mar 2023 at 23:59 (Friday of Week 9) sharp! No

extensions will be entertained.


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp