联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2021-06-05 10:58

CSCE 474/874: Introduction to Data Mining

Spring2021

Homework 3 March 02, 2021

Assignment

Implement the k-means algorithm to perform clustering and compare your results with

the results from Weka.

• Assume that all the attributes are continuous variables.

• Your program must allow the number of clusters (k) to be specified as input.

• Your program must allow the epsilon (change in the sum of the distances from the

cluster centers) to be specified as input.

• Your program must allow the number of iterations to be specified as input.

Your program should stop if either the number of iterations is reached or if the change in

the total sum of the squares of the distances (SSD) falls below epsilon.

Plot the runtime of the algorithm as a function of number of clusters, number of

dimensions and size of the dataset (number of transactions).

Plot the goodness of clustering as a function of the number of clusters and determine the

optimal number of clusters.

Compare the performance of your algorithm with that of Weka and summarize your

results.

For this assignment you will work in teams. Use the dataset from the domain you will be

working on for the project. If the data is not suitable, you may use one from the Weka

dataset.

All code must be written by the members of your team. You may NOT use any code

from ANY OTHER source, including other students and the Internet.

Due Date

The assignment is due on March 16 is worth 100 points.

Handin

Hand in a report along with the listing of your program, the output generated from the run

of the test file on Canvas. Make sure that you have uploaded a signed copy of the

Contributions form. Prepare and submit two files as follows:

• Your report named as “Lastname1_Lastname2.pdf” in pdf format. The signed

contributions form should be used as the cover page of your report.

• A zip file named “Lastname1_Lastname2.zip” that includes everything else (your

program, the output generated from the run of the test file, etc.). You must include

a README file that describes the usage of your program. Make sure your

implementation can successfully execute on the CSE server.

Grading Guidelines

Implement the k-means algorithm to perform clustering in a dataset. (50 points)

• Your implementation will be tested on cse.unl.edu server using the command you

provided in the README file. (30 points)

• In the report, you should write a paragraph about your program design (10 points)

Plot the runtime of the algorithm as a function of number of clusters, number of

dimensions and size of the dataset (number of transactions). (20 points)

• In the report, you should write a paragraph to summarize the observation and

elaborate on it.

Plot the goodness of clustering as a function of the number of clusters and determine the

optimal number of clusters. (20 points)

• In the report, you should write a paragraph to summarize the observation and

elaborate on it.

Compare the performance of your algorithm with that of Weka and summarize your

results. (10 points)

• Summarize the differences (if there is any) and elaborate on it (why/how).


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp