联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2019-06-15 11:15

ASSIGNMENT 2 – APACHE SPARK

Introduction

In this assignment, you will use MLLIB/ML, which are Apache Spark based machine

learning libraries on real world datasets.

Before you start working on the assignment, you must have completed the in-class

exercise (based on http://spark.apache.org/docs/latest/quick-start.html) and the Machine

Learning Library (MLlib) at http://spark.apache.org/docs/latest/mllib-guide.html

Datasets

1. US fatal road accident data for automobiles, 1998 to 2010.

2. Consumer Complaints

Download the datasets from: \FACULTY COURSE RESOURCES\Big Data and Largescale

Computing\DataSetsforAssignment2M19. The datasets are easy to

understand. Just study the header row for attribute information.

Task 1 (50 points) – Write a SPARK program for classification

Select any two classification learning algorithms available in Spark’s Machine

Learning Library.

Select a target attribute from each of the datasets provided and learn a

classification model to predict the target attribute.

Use 70% training and 30% test splits of data.

For both datasets, print the test error rates.

A useful JAVA example for decision tree learning can be found here:

https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/

spark/examples/mllib/JavaDecisionTreeClassificationExample.java

2

Task 2 (50 points) – Write a SPARK program to cluster data

Select K-means and Gaussian mixture clustering algorithms from Spark’s

Machine Learning Library.

Select appropriate attributes to cluster the data in each of the two datasets.

Apply the clustering algorithms to the transformed datasets.

For the Gaussian mixture clustering your program should output the parameters

of the mixture model and for K-means the “Within Set Sum of Squared Errors”.

A useful JAVA example for k-means can be found here:

https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/

spark/examples/mllib/JavaKMeansExample.java

Submission requirements and grading

Upload the source code for your program in a zipped file to Canvas. Demonstrate

both tasks to the TA during the Lab or consultation hours.

Remember that all work must be your own.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp