联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2020-05-01 06:09

SDGB-7847

Final Exam


The data we are working with is in longitudinal format. Each column represents a patient, and each row represents a gene expression reading for genes 1-5913. The patient’s disease status is marked in the column header. The first 20 patients are marked with ‘meta,’ meaning these patients have a form of metastatic cancer (disease=1). The last 20 patients do not have the disease (disease=0).


You will need to transform this data into a model-ready format in order to predict metastatic disease by patient’s expression of each gene.


Set your R’s seed to 1234.


Once your data is ready to model, separate it into training and test sets.


Apply the following algorithms- training on your training data and testing on your test data- to predict disease based on gene expression. From your test data, pull out your accuracy, sensitivity and specificity.


RF (RF on the full dataset may take a long time to run due to the number of genes being used as predictor variables)


RF+PCA


KNN + PCA (Use iteration to find optimal value of K)

In an external document, write a discussion on which algorithm you would choose and why. Discuss what the variable importance plot showed for RF and RF + PCA, the number of principal components you chose and what you chose as your optimal value of K.


Upload your code and your external explanation document by Thursday, April 30th at 8pm.


Thank you for a wonderful class and have a great summer! Stay in touch!


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp