联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-05-15 10:43

STATS 4014

Advanced Data Science

Assignment 4

CHECKLIST

: Have you shown all of your working, including probability notation where necessary?

: Have you given all numbers to 3 decimal places unless otherwise stated?

: Have you included all R output and plots to support your answers where necessary?

: Have you included all of your R code?

: Have you made sure that all plots and tables each have a caption?

: If before the deadline, have you submitted your assignment via the online submission on MyUni?

: Is your submission a single pdf file - correctly orientated, easy to read? If not, penalties apply.

: Penalties for more than one document - 10% of final mark for each extra document. Note that you

may resubmit and your final version is marked, but the final document should be a single file.

: Penalties for late submission - within 24 hours 40% of final mark. After 24 hours, assignment is not

marked and you get zero.

: Assignments emailed instead of submitted by the online submission on MyUni will not be marked

and will receive zero.

: Have you checked that the assignment submitted is the correct one, as we cannot accept other

submissions after the due date?

Due date: Friday 17th May 2019 (Week 9), 5pm.

Q1. Natural splines

Consider the data

(x1, y1),(x2, y2), . . . ,(xn, yn).

Suppose that g(x) is a natural cubic spline with knots

Let g(x) be any other twice continuously differentiable function such that

1

a. If h(x) = g(x) g(x) then use integration by parts to show that if h(x) = 0 for all a < x < b.

c. Show that the solution to the problem of finding a smoothing spline:

must be a natural cubic spline with knots at

x1, x2, . . . , xn.

Q2. ROC class

a. Create an S3 class that deals with ROC curves. For complete marks, you will need

i. a constructor,

ii. a print function,

iii. a plot function, and

iv. a generic confusion matrix function that takes a ROC object and cutoff and returns the confusion

matrix.

To give an example, code using my S3 class is given below.

data("starwars")

starwars <-

starwars %>%

mutate(human = ifelse(species == "Human", 1, 0)) %>%

na.omit()

starwars_lr <- glm(human ~ height + mass, data = starwars, family = binomial())

starwars_roc <- ROC(

pred = predict(starwars_lr),

obs = starwars$human

)

starwars_roc

## The number of observations is 29.

## The number of positives is 18.

## The number of negatives is 11.

##

## First rows of data

## # A tibble: 6 x 2

## pred obs

## <dbl> <dbl>

2

## 1 0.705 1

## 2 2.31 1

## 3 0.184 1

## 4 2.37 1

## 5 0.836 1

## 6 0.665 1

##

## First row of summary data frame:

## TP FP FN TN Score FPR TPR precision recall

## 1 0 0 18 11 2.3652725 0.00000000 0.00000000 NaN 0.00000000

## 2 1 0 17 11 2.3093987 0.00000000 0.05555556 1.0000000 0.05555556

## 3 2 0 16 11 1.6933920 0.00000000 0.11111111 1.0000000 0.11111111

## 4 2 1 16 10 0.8576164 0.09090909 0.11111111 0.6666667 0.11111111

## 5 2 2 16 9 0.8357629 0.18181818 0.11111111 0.5000000 0.11111111

## 6 3 2 15 9 0.7668831 0.18181818 0.16666667 0.6000000 0.16666667

TPR plot(starwars_roc, type = "PR")

conf_matrix(starwars_roc)

## # A tibble: 2 x 3

## HC `0` `1`

## <dbl> <int> <int>

## 1 0 7 6

## 2 1 4 12

conf_matrix(starwars_roc, cutoff = 0.9)

## # A tibble: 2 x 3

## HC `0` `1`

## <dbl> <int> <int>

## 1 0 10 16

## 2 1 1 2

conf_matrix(1:10, cutoff = 0.9)

## [1] "I do not know how to deal with the class default"

Q3. Titanic dataset

The data in titanic.csv contains the details for 712 passengers on the ship Titanic. The following variables

are given:

4

Variable Definition Key

survival Survival 0 = No 1 = Yes

pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd

sex Sex

Age Age in years

sibsp # of siblings / spouses aboard the Titanic

parch # of parents / children aboard the Titanic

ticket Ticket number

fare Passenger fare

cabin Cabin number

embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

pclass: A proxy for socio-economic status (SES)

1st = Upper

2nd = Middle

3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way. . .

Sibling = brother, sister, stepbrother, stepsister

Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way. . .

Parent = mother, father

Child = daughter, son, stepdaughter, stepson

Some children travelled only with a nanny, therefore parch=0 for them.

a. Read in the dataset and clean it.

b. Fit a MARS model.

c. Fit a CART.

d. Using both models, predict which is more likely to survive a first class 24 year old male travelling alone

or a first class 24 year old female travelling alone.

e. According to both models, which class and sex are least likely to survive?

5

Mark scheme

Part Marks Difficulty Area Type Comments

Q1

1a 7 0.29 Splines proof 7 for proof

1b 7 0.29 Splines proof 7 for proof

1c 5 0.00 Splines proof 5 for proof

Total 19

Q2

2ai 5 0.00 S3 OOP coding 5 for code

2aii 5 0.00 S3 OOP coding 5 for code

2aiii 6 0.50 S3 OOP coding 6 for code

2aiv 6 0.50 S3 OOP coding 6 for code

Total 22

Q3

3ab 4 0.00 MARS/CART analysis 4 for analysis

3c 2 0.00 MARS/CART analysis 2 for analysis

3d 4 0.00 MARS/CART analysis 4 for analysis

3e 3 0.00 MARS/CART analysis 3 for analysis

Total 13

Assignment total 54

6


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp