代做R编程、代写R程序、代写R、代写R报告帮做R语言编程|代写留学生R-代写OS作业

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

您当前位置：首页 >> OS作业OS作业

代做R编程、代写R程序、代写R、代写R报告帮做R语言编程|代写留学生R

日期：2018-09-22 02:12

In the programming assignment 1, you are asked to perform data analysis and data

preprocessing using the following dataset. You can use built-in function in sklearn and

matplotlib for these tasks.

Dataset

● Bank Marketing Dataset [1]

The dataset is related with direct marketing campaigns of a Portuguese banking

institution. The

classification goal is to predict whether a client will subscribe a term deposit.

● Each data record, describing a client, contains the basic information of the client

and whether the client subscribed the term project. Please treat column 1 to 20 as

features and column 21 as the class.

Data Analysis

● Task 1. Plot the distribution of values in the class attribute of the dataset using a

bar chart. Please describe what you observe, e.g. whether the data distribution is

imbalanced.

● Task 2. Read the reference and answer the following questions.

a) Please summarize the characteristics and differences of chi-square

function (https://en.wikipedia.org/wiki/Chi-squared_test) and mutual

information functions (https://en.wikipedia.org/wiki/Mutual_information).

b) Can we simply apply chi-square function and mutual information function

on Bank Marketing Dataset for feature selection? Please explain. (hint: the

difference between categorical and numerical data)

c) Employ chi-square or mutual information as appropriate to obtain a

measure between values of each feature and the class. Rank features by

their measures of chi-square and mutual information.

Note: Please make two lists: one for chi-square and the other for mutual

information. An attribute only belongs to one list.

● Task 3. Based on the two ranked lists obtained in Task 2, plot the value

distribution of (i) the highest ranked three categorical features, (ii) the lowest

ranked three categorical features, (iii) the highest ranked three numerical features,

and (iv) the lowest ranked three numerical features. Describe what you observe

from these value distributions.

Note: Please plot a Bar chart and a Histogram for a categorical feature and a

numerical feature, correspondingly. See below for examples. For Histogram,

please evenly divide the overall value range into 10 intervals. For each bar and

interval, please color the portion of records/instances corresponding to different

classes and show the overall count.

Bar Chart

Histogram

Data preprocessing

• ● Task 3. Normalize the range of values of numerical features. If values are all

positive or all negative, normalize them into [0, 1] or [-1, 0], respectively.

Otherwise, normalize them into [-1, 1]. For each normalized numerical feature,

submit the ranges of its original and normalized values.

• ● Task 4. Encode categorical features using one-hot representation scheme. For

example, assuming that there is a ‘state’ feature with three categorical

values, ’PA’, ‘NY’ and ‘NJ’. Create three new binary features, namely

‘state_is_PA’, ‘state_is_NY’ and ‘state_is_NJ’ to replace ‘state’, where the

feature values are either 0 or 1. For each new binary feature, count and report the

number of value 1, e.g., “state_is_PA”: 15000, “state_is_NY”: 20000 and

“state_is_NJ”: 10000.

Packages

● sklearn (http://scikit-learn.org/). A machine learning framework in Python

● matplotlib (https://matplotlib.org/). Website provides tutorials on how to plot bar

chart and histogram in python.

● NumPy (http://scikit-learn.org/). A fundamental package for scientific computing

with Python.

Reference

[1] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of

Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014.

[2] Please refer to http://scikit-learn.org/stable/modules/preprocessing.html#.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代写MVC implementation 帮做COSC 2391/2401 编程、Software Architecture...

【下一篇】：代写MVC implementation 帮做COSC 2391/2401 编程、Software Architecture...

联系方式

最新辅导

热门辅导

您当前位置：首页 >> OS作业OS作业

代做R编程、代写R程序、代写R、代写R报告帮做R语言编程|代写留学生R

日期：2018-09-22 02:12

相关文章