联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-02-22 09:38

M345S17 QUANTITATIVE METHODS IN RETAIL FINANCE 2019 COURSEWORK 2

Fraud scoring using Anomaly Detection

In this exercise you will use the data set of fraudulent credit card

transactions that you used in CW1. This time you will use a multivariate

kernel density estimator (KDE) for prediction. You will use the R

statistical language for your work.

Tony Bellotti

TIMESCALE

Your coursework must be submitted by 4pm on Monday 25th February

2019.

INSTRUCTIONS

1. Use the training and test set of credit card transactions that you used

in CW1.

2. When using the KDE with high dimensional data, there are

computational problems. One is that R cannot handle the precision of

the calculations (eg the normalizing term 1

can become very large).

For this reason we will work with the log of the density. Following the

material in Chapter 7 of the course notes, show that when the

standard multivariate normal function

is used as the kernel function, then

log

for some constant (constant relative to) and where


3. Show that ≤ logP

for any and .

Page 2 of 3

4. Implement the formula from step 2 in R to compute the fraud score.

See Appendix A for coding hints.

5. Use your R implementation of to compute fraud scores for all

observations in the test set, based only on the density estimate of

legitimate transactions in the training data set. Use ? = 0.1.

6. Construct a precision-recall (PR) curve and compute the area under

the precision-recall curve (AUPRC), when applying these fraud scores

to the test data set.

7. If an alarm rate of no more than 0.5% is required, what is the

maximum recall that can be achieved using this model, based on the

results on the test set?

8. How do the results using ANN and KDE compare? Which is the better

approach, and why?

9. Your coursework must be submitted as:

a) a paper copy of your solutions to the student office and

b) an electronic submission, along with an R script giving the

commands you used to complete the coursework, emailed to

a.bellotti@imperial.ac.uk with subject heading “M345S17 CW2”.

Your R script should include annotated comments describing what it

is doing at each step.

Remember to include all results and the R code you used in your report.

Page 3 of 3

APPENDIX A: HELP IN R

Although you can use the PRROC package again, do not use a package to implement the

KDE. You need to code this yourself.

for loops

You can use for loops in R to implement the sum in the KDE and to calculate the fraud

score for each test observation. The for loop has the following syntax:-

for (i in a:b) {

… statements ….

}

It will cycle over values of i from a to b.

For example, this code will compute the factorial of x:

x <- 4 #(or other input)

fac <- 1

for (i in 1:x) {

fac <- fac*i

}

fac

Warning! Implementing KDE with for loops is slow. As a guide, on my five year old laptop,

it takes just over an hour for KDE to compute fraud scores for the whole data set. Therefore,

I suggest that while you are writing and debugging your R code, you use a small subsample

of your test data, just to check it is working right. Only apply to the whole test data when

you are confident it is working correctly.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp