CISC520 Data Engineering and Mining
Homework 2
Task description:
The data set comes from the Kaggle Digit Recognizer competition. The goal is to recognize
digits 0 to 9 in handwriting images. Because the original data set is large, I have systematically
sampled 10% of the data by selecting the 10th, 20th examples and so on. You are going to use the
sampled data to construct prediction models using multiple machine learning algorithms that we
have learned recently: native Bayes, kNN and SVM algorithms. Tune their parameters to get the
best model (measured by cross validation) and compare which algorithms provide better model
for this task.
Report structure:
Section 1: Introduction
Briefly describe the classification problem and general data preprocessing. Note that some data
preprocessing steps maybe specific to a particular algorithm. Report those steps under each
algorithm section.
Section 3: Native Bayes
Build a native Bayes model. Tune the parameters, such as the discretization options, to compare
results.
Section 3: K-Nearest Neighbor method
Section 4: Support Vector Machine (SVM)
Section 4: Algorithm performance comparison
Compare the results from the two algorithms. Which one reached higher accuracy? Which one
runs faster? Can you explain why?
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。