联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2021-11-18 09:00

Machine Learning with Python (2021 Fall semester)

Programming Assignment: Classification of Titanic Data Set

1. Benchmark Dataset: This is the problem of predicting survivals based on the information of the

people on board the Titanic. You should evaluate the performance of each model using the machine

learning models presented in the assignment. You can download the dataset from the following

website: https://www.kaggle.com/c/titanic

In this assignment, both model training and testing use the train.csv file.

When performing the task, be careful NOT to use the following features for model training:

PassengerId, Name, Ticket, Cabin

2. Preprocessing

1. There are data with missing values in the train data. Remove these data.

2. Use the sample of train.csv 7 to 3 as training data and test data.

3. Machine Learning Models: Use scikit-learn to implement the following three machine learning

models and evaluate their performance.

3-1 K-Nearest Neighbors(KNN) (sklearn.neighbors.KNeighborsClassifier): Analyze how the results

change in the test data while changing the number of K to [3-5].

3-2 Logistic Regression (sklearn.linear_model.LogisticRegression): Analyze how the results change in

the test data while changing the number of iterations (max_iter) by 20 in the range of [0-100]. After

fixing the number of iterations to 100, change the regularization term (C in scikit-learn) by 1 in the

range of [1 to 5] and analyze how the results change in the test data.

3-3 Decision Tree (sklearn.tree.DecisionTreeClassifier): Analyze the separation criteria of the first and

second depths in the decision tree with information gain. Also, when max_depth=None, use an

appropriate tool to visualize the tree to know the condition and gain values at each depth. Analyze

how the results change in the test data when max_depth is changed to [1~3, None].

4. Evaluation Methods: Show the performance according to each model through Accuracy and F1-

Score.

5. Submission Form: There are 3 files to be submitted. You can submit the csv file, report, and python

file in a zip file. The file name must follow the student number_name.zip format (eg,

2020714950_Hong_Gil-dong.zip). When the python file is executed while the csv file and the python

file are in the same directory, it should be clearly expressed how the results are from each machine

learning model. This is to check whether the performance in the report is similar to the performance

in actual execution. If you wrote it as an ipynb file, you can submit it instead of a python file.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp