联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2021-04-09 11:36

DATA3888 (2021): Assignment 1

Question 1: Brain-box

Build a classification rule for detecting {L, R} under streaming condition where the function will take a

sequence of signal as an input. Note, this is slightly different to detecting {L, R} for a given sequence.

• (i) Estimate the accuracy of your classifier. Is your value reasonable?

• (ii) Dose the length of the sequence impact on the performance of your classifier?

Hint:

(a) Consider what metric you will use to define “performance”? You will need to explain your choice and

justify your answer.

(b) You can use data generated by either Louis (Spiker_box_Louis.zip) or Zoe (zoe_spiker.zip).

(c) The code below is a guide only, you do not need to follow the structure.

streaming_classifier = function(wave_file,

window_size = wave_file@samp.rate,

increment = window_size/10,

<other arguments you require>

)

{

Y = wave_file@left

xtime = seq_len(length(wave_file@left))/wave_file@samp.rate

predicted_labels = c()

lower_interval = 1

max_time = max(xtime)*window_size

while(max_time > lower_interval + window_size)

{

upper_interval = lower_interval + window_size

interval = Y[lower_interval:upper_interval]

predicted = <write your clssifier>

predicted_labels = c(predicted_labels, predicted)

lower_interval = lower_interval + increment

} ## end while

}## end function

Question 2: Biomedical COVID19 data

Consider the prevalidation principle where a molecular signature (set of features) from a given omics

data platform is used to obtain a single variable known as prevalidated outcome. Next, we model this

prevalidated outcome in combination with the others other clinical variables to build a classifier of outcome

of interest. In this exercise, ignoring healthy individual,

1

• (i) build a classifier to predict disease outcome (moderate vs severe), including a feature selection

component on the proteomics data. Illustrate your comparison results using boxplot (similar to

the sample code in #3.6); and

• (ii) generate a prevalidated outcome from the proteomics data and use it together with the clinical

variables in a logistic regression to build a classifier.

Describe your final model for classifying severe and non-severe individuals and your estimate of its accuracy.

Note: The prevalidation procedure similar in concept to cross-validation procedure is detailed and graphically

presented below. The 5-steps are:

• Step 1. Divide the samples into k equal parts.

• Step 2. Set aside one part as the test set component.

• Step 3. A protein signature (set of features) is obtained using the training set (k − 1 parts), and a

classifier is trained on the training set on the protein signature.

• Step 4. Use this classifier to predict the survival class of the kth part (from Step 2).

• Step 5. Repeat steps 2-4 for all k parts, resulting in a prevalidated vector of estimates for the protein

data. This prevalidated vector (denoted as APV) is a complete prediction vector with one prediction

for each sample.

Question 3: Lag time estimation

For the month of March to May in 2020, estimate the lag time between number of daily new cases (new_cases)

and the number of hospital patients (hosp_patients) for all countries with data available and display

your results on the world map. Is this visualisation appropriate in this context? Please explain your response

and recommend a better choice if you don’t think this is appropriate (illustration is welcome).

[Bonus question] For the month of August to November in 2020, estimate the lag time between number

of daily new cases (new_cases) and and the number of hospital patients (hosp_patients). Compare this

estimate with the one between March to May in 2020. Describe your observation, what did you learn from

this data?

2


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp