代做CISC520、代做Java语言、Python，c++编程调试、代写data set-代写Java编程

CISC520 Data Engineering and Mining

Homework 2

Task description:

The data set comes from the Kaggle Digit Recognizer competition. The goal is to recognize

digits 0 to 9 in handwriting images. Because the original data set is large, I have systematically

sampled 10% of the data by selecting the 10th, 20th examples and so on. You are going to use the

sampled data to construct prediction models using multiple machine learning algorithms that we

have learned recently: native Bayes, kNN and SVM algorithms. Tune their parameters to get the

best model (measured by cross validation) and compare which algorithms provide better model

for this task.

Report structure:

Section 1: Introduction

Briefly describe the classification problem and general data preprocessing. Note that some data

preprocessing steps maybe specific to a particular algorithm. Report those steps under each

algorithm section.

Section 3: Native Bayes

Build a native Bayes model. Tune the parameters, such as the discretization options, to compare

results.

Section 3: K-Nearest Neighbor method

Section 4: Support Vector Machine (SVM)

Section 4: Algorithm performance comparison

Compare the results from the two algorithms. Which one reached higher accuracy? Which one

runs faster? Can you explain why?