代写BS6202 Assignment 2代写Python语言-代写Database作业

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

您当前位置：首页 >> Database作业Database作业

代写BS6202 Assignment 2代写Python语言

日期：2024-08-27 04:13

Assignment 2

BS6202

Please find attached with this assignment, data pertaining to gene expression profiles of lymphoblastoid cells.

Dataset Description

1. “data.csv” – Gene expression profiles with rows representing the genes and columns the samples.

2. “meta_data.csv” – Meta. data corresponding to the gene expression profiles with rows representing the samples and columns the various clinical attributes such as age, treatment status, etc.

Task 1: Cluster the samples using the gene expression profile and evaluate the goodness of your clustering. Also, describe the rationale behind choosing a specific clustering algorithm.

We should use PCA first, and then use Kmeans and finally apply clustering to finish the questions. The data.csv file contains the gene sets for each person in the census. PCA is principle component analysis. It can reduce larger data sets but maintain the patterns and trends. We need to reduce the dimensions of such complex sets of data. K-means is another algorithm which can group the unlabeled data sets into different clusters. If we use python to deliver this diagrams, there should be two plots. Each diagram has its PC1 on x-axis and PC2 on y-axis. The different data sets will be grouped into different colors and different groups. The k-mean is around 0.05. PC1 is ranged from almost -40 to 80 and PC2 is ranged from -40 to 100 on y-axis.

Task 2: Create a predictive model to predict “sex” using the given gene expression profile and evaluate your predictive model. Also, describe the rationale behind choosing a specific predictive algorithm.

For task2, the data sets contain more information about the personal information such as sex. We should use PCA first, and then use Kmeans and finally apply clustering to finish the questions. The data.csv file contains the gene sets for each person in the census. PCA is principle component analysis. It can reduce larger data sets but maintain the patterns and trends. We need to reduce the dimensions of such complex sets of data. K-means is another algorithm which can group the unlabeled data sets into different clusters.I calculated the average values of the genes and used python to read through the data of those gene sets. And then we can check which sex is closer to those average values of the gene sets. If they are close, we can select that pair of sex. In my diagram, the PC1 on x-axis is around 80% and PC2 on y-axis is around 8%.

You may perform. the above tasks in your groups using a variety of methods and strategies. However, each person is to take this preliminary analysis, further develop and refine. Write into a short 2-4 page report and submit individually.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代做AB1501 Marketing 2024/2025代写Web开发

【下一篇】：代做AB1501 Marketing 2024/2025代写Web开发

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Database作业Database作业

代写BS6202 Assignment 2代写Python语言

日期：2024-08-27 04:13

相关文章