联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-08-18 11:30

CSCI316 (SIM) 2023 Session 3 Individual Assignment 2

CSCI316 – Big Data Mining Techniques and Implementation

Individual Assignment 2

2023 Session 3 (SIM)

15 Marks

Deadline: Refer to the submission link of this assignment on Moodle

Three (3) tasks are included in this assignment. The specification of each task starts in a separate page.

You must implement and run all your Python code in Jupyter Notebook. The deliverables include one

Jupyter Notebook source file (with .ipybn extension) and one PDF document for each task.

Note: To generate a PDF file for a notebook source file, you can either (i) use the Web browser’s PDF

printing function, or (ii) click “File” on top of the notebook, choose “Download as” and then “PDF via

LaTex”.

All results of your implementation must be reproducible from your submitted Jupyter notebook source

files. In addition, the submission must include all execution outputs as well as clear explanation of your

implementation algorithms (e.g., in the Markdown format or as comments in your Python codes).

Submission must be done online by using the submission link associated with assignment 1 for this

subject on MOODLE. The size limit for all submitted materials is 20MB. DO NOT submit a zip file.

This is an individual assignment. Plagiarism of any part of the assignment will result in having 0 mark for

the assignment and for all students involved.

CSCI316 (SIM) 2023 Session 3 Individual Assignment 2

Task 1

(7.5 marks)

Data set: Customer Churn Dataset

https://www.kaggle.com/datasets/muhammadshahidazeem/customer-churn-dataset

Objective

The objective of this task is to implement a Random Forest classifier based on a Decision Tree model which

you implemented in Task 2 of Individual Assignment 1. (Note. If you have implemented multiple DT models,

you can choose any one of them as the base model. The DT model which you use must be from Task 2 of

Individual Assignment 1.)

Task requirements

(1) Clearly state which method you use for this Random Forest classifier.

(2) Compare the performance of this Random Forest classifier and the performance of DT models which

you have implemented.

Deliverables

• A Jupiter Notebook source file named <your_name>_task_x.ipybn which contains your

implementation source code in Python

A PDF document named <your_name>_task_x.pdf which is generated from your Jupiter Notebook

source file, and presents clear and accurate explanation of your implementation and results.

CSCI316 (SIM) 2023 Session 3 Individual Assignment 2

Task 2

(7.5 marks)

Data set: MAGIC Gamma Telescope Dataset

(Source: https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope)

The data are Monte-Carlo generated to simulate registration of high energy gamma particles in a groundbased atmospheric Cherenkov gamma telescope using the imaging technique. The dataset contains 19,020

records. Attribute information:

1. fLength: continuous # major axis of ellipse [mm]

2. fWidth: continuous # minor axis of ellipse [mm]

3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]

4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]

5. fConc1: continuous # ratio of highest pixel over fSize [ratio]

6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]

7. fM3Long: continuous # 3rd root of third moment along major axis [mm]

8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]

9. fAlpha: continuous # angle of major axis with vector to origin [deg]

10. fDist: continuous # distance from origin to center of ellipse [mm]

11. class: g,h # gamma (signal), hadron (background)

g = gamma (signal): 12332

h = hadron (background): 6688

Objective

Develop an Artificial Neural Network (ANN) in TensorFlow/Keras to predict the class.

Requirements

(1) You can (but not must) use Scikit-Learn or other Python libraries to pre-process and visualise the data

set. However, the ANN must be implemented with the Keras API in TensorFlow.

(2) You can use any ANN architecture (incl. feedforward, CNN, etc.) which has at least two hidden layers.

(3) The training process includes a hyperparameter fine-tunning step. Define a grid including at least three

hyperparameters: (a) the number of hidden layers, (b) the number neurons in each layer, and (c) the

regularization parameter for L1 and L2. Each hyperparameter has at least two candidate values. All

other hyperparameters (e.g., activation functions and learning rates) are up to you.

(4) Use 2/3 data for training and 1/3 for test. Report the loss values for training and test.

(5) Present clear and accurate explanation of your ANN architecture and results.

Deliverables

• A Jupiter Notebook source file named <your_name>_task_x.ipybn which contains your

implementation source code in Python

• A PDF document named <your_name>_task_x.pdf which is generated from your Jupiter

Notebook source file.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp