代做INF6027 – Introduction to Data Science调试数据库编程-代写Web作业

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

您当前位置：首页 >> Web作业Web作业

代做INF6027 – Introduction to Data Science调试数据库编程

日期：2025-02-10 11:16

INF6027 – Introduction to Data Science

Coursework Brief (100% of the module credits)

Please note: This coursework involves using the same dataset as your INF4000 assignment.

You must use the same dataset so that you can spend more effort in thoroughly understanding your data better. There are, therefore, two primary areas in the INF4000 and INF6027 assignment where you can benefit from using the same dataset.

The primary purpose of the INF4000 module is to evaluate your ability to effectively present interesting findings using visualisations, your understanding of theories for constructing them, your rationale behind their design, your skill in critiquing them with real-world examples, and how they enhance topic comprehension.

The primary purpose of the INF6027 module is to assess how you identify a problem based on a given dataset, and then conceptualise, design, and implement a data science project. We expect visualisations to be created in this module, particularly to help you in exploring the selected dataset(s) and presenting results of your analysis/models.

You can therefore use the same or similar visualisations for the two modules, but they need to be differently contextualised, positioned and discussed.

Introduction

This assessment for INF6027 Introduction to Data Science comprises a piece of individual coursework to assess your ability to analyse data using R/RStudio and to then communicate your findings. Given a specific topic and dataset (see Section 2), you should identify a specific problem or topic you would like to investigate. You will then need to pre-process and analyse the dataset to identify patterns and relationships that address your selected problem/topic. This should involve using techniques learned throughout the practical sessions that will help you to demonstrate your R skills in conducting data science.

This coursework aims to follow the stages involved in a ‘typical’ data science process:

1) define the question(s) to address (note, sometimes this does not come at the start of the process, but after initial exploration of the data);

2) gather data;

3) transform, clean and structure the data;

4) explore and analyse the data; and

5) communicate the findings of the data analysis.

This often occurs in an iterative manner and centred on one or multiple questions you are seeking to address. For example, the data discovery process in Figure 1 presents an example of the stages involved in data discovery as an iterative process and you can find more details in Section 3. This is also similar to the data science process from the “ Doing Data Science” book (O’ Neil & Schutt, 2013).

Fig.1 Example data discovery process (Johnes, 2014:p2)

You should write a 2,800 word structured report (see Section 4) that describes the approach you have taken to explore and analyse the data for the selected problem/topic. Your report should clearly communicate the results of your data analysis and be written in a way that helps the reader interpret your findings. Note: charts, tables, and appendices are not included in the word count.

This assessment is worth 100% of the overall module mark for INF6027. A pass mark of 50 is required to pass the module as a whole. Submission deadline: 2pm Monday 16th January 2025 via Turnitin. See Section 5 for more general information about Coursework Submission Requirements within the Information School.

Dataset options

There are a number of datasets you can choose from for this coursework. You must:

● Choose one primary dataset to analyse in your coursework, although some datasets may contain multiple files that you need to link and integrate.

● You can, if you choose to, combine multiple datasets (strictly within this list) and perform some data analysis. However, your focus of the study should be the primary dataset.

● There are multiple datasets on each topic, spanning over different time periods and each dataset has different characteristics that could be studied. You would likely need to join multiple files from one of the dataset or join multiple datasets.

You must not use datasets outside the ones provided by us. This is because the use of a dataset will need ethics approval, which requires more time than we have for this assessment.

The list of datasets are in Appendix A

What you need to do

The following sections describe what you need to do in order to carry out the coursework. This roughly follows the steps shown in Fig. 1, but you don’t have to be constrained by this or follow them in this particular order; it is just a suggestion. Also, all the R we have done in the practical sessions should be enough to conduct the coursework, although you may need to investigate certain areas further that relate specifically to the problem you tackle in your investigation.

a) Review the literature and identify research question(s)

As mentioned previously, you should select a specific problem/topic related to the data (the ‘question’ stage in Fig. 1). To decide what area to focus on you could start by undertaking a brief review of the relevant literature around the broad domain. As examples:

Football: clustering of similar players, analysis of player and team statistics, predictive modelling between player statistics and match outcome, etc.

Maneiro, R. et aol. (2019). Offensive Transitions in High-Performance Football: Differences Between UEFA Euro 2008 and UEFA Euro 2016. Front. Psychol. 10:1230

Sarmento, H. et al. (2014). Match analysis in football: a systematic review, Journal of Sports Sciences, 32:20, 1831-1843

Deprivation: comparison of different areas, correlation or predictive modelling between different indicators (e.g., are there any associations between certain indicators?), clustering of local areas based on different deprivation index, relation between deprivation and other population or socio- economic phenomenons (you may have to search for and join other datasets).

Aungkulanon, S. et al. (2017). Area-level socioeconomic deprivation and mortality differentials in Thailand: results from principal component analysis and cluster analysis. Int J Equity Health 16, 117

Salvatore, M. et al. (2021). Area deprivation, perceived neighbourhood cohesion and mental health at older ages: A cross lagged analysis of UK longitudinal data, Health & Place, Volume 67, 102470

News dataset: analysis of news articles, sentiment analysis of news over time, studying news topics and correlations between topics, comparing manually generated categories with topic modelling.

Rameshbhai, C. J., & Paulose, J. (2019). Opinion mining on newspaper headlines using SVM and NLP. International journal of electrical and computer engineering (IJECE), 9(3), 2152-2163.

Liang, H., Ganeshbabu, U., & Thorne, T. (2020). A dynamic Bayesian network approach for analysing topic-sentiment evolution. IEEE Access, 8, 54164-54174.

Stock/Share: clustering of similar stocks, sector analysis, temporal analysis of individual stocks or sectors, correlation between indicators (e.g., a particular kind of expense and income/profits).

Liu, H., Huang, S., Wang, P., Li. Z. (2021). A review of data mining methods in financial markets. Data Science in Finance and Economics, 1(4): 362-392.

Ng, K. et al. (2017). StockProF: a stock profiling framework using data mining approaches. Inf Syst E-Bus Manage 15, 139–158.

Reviewing past literature will help you understand what kinds of analyses are undertaken in your chosen domain and provide a possible source of ideas for what you could do with the datasets mentioned in Appendix A

You are highly recommended to discuss your ideas with the tutors in-class, as they may give you feedback on the feasibility of the idea and/or difficulties in finding related literature. Do not leave this too late as your tutors will receive an increased amount of queries towards the coursework deadline and this is better discussed through a chat than emails. All submissions must have research questions.

b) Download, pre-process and explore the data

As well as reviewing relevant academic literature you should also download some data as clarified above and perform an exploratory analysis (i.e. ‘play’ with the data), to better understand the dataset and also help you to identify a particular problem or topic you might want to focus on. This part of your investigation will include steps to pre-process and transform the data, such as cleaning up the data, dealing with missing values, standardising numeric values, etc. This may also include combining or joining the data with another dataset from the list of options (should you choose to do so). This reflects the ‘gather ’ and ‘structure’ stages in Fig. 1. (Note: this part of the analysis could take a lot of time so don’t underestimate how much time you will need to spend on this part of the coursework.)

c) Analyse and explore the data

As you identify a topic of interest for your analysis then you should identify the most appropriate techniques (using R and associated packages) for carrying out your analysis and exploring the data. E.g. for football, you might want to predict match performance for a player based on their statistics. This might also be an iterative process whereby you perform some analysis and then gather (or remove) more data. Where possible relate your analysis to the relevant literature. This relates to the ‘exploring data’ stage in Fig. 3.

Note that this is often an iterative process: as you explore the data you may end up re-designing your research questions, having to gather more data or having to perform. further cleaning as more data quality issues arise. Again, this is all a part of the data discovery process.

d) Write up your findings

Once you have performed analysis on the data and have some results then you need to write up your investigation into a report (this is the ‘communicate’ stage of Fig. 1). The report should be structured as outlined in Section 4. Writing up will need to be done for two sets of readers: (i) the report will need to present your findings as would be expected from a research paper; (ii) a set of Git Hub pages where you present yourself and your project to a prospective employer/client.

A research audience: You will be evaluated on your ability to plan and undertake data analysis and exploration of the problem based on your chosen dataset, your ability to engage with the relevant literature, your use of R (and appropriate packages) and RStudio to process and analyse the data, and the way in which you communicate your findings within the report for your given problem/topic.

A prospective client/employer: You must also provide your R code, together with a summary of your key findings on Git Hub. The code must be commented, appropriately indented, using variable names that are appropriate. You should provide sufficient information on the code so that someone else can follow what you have done. The code should also be consistent (i.e. same standards across all code files).

The Git Hub pages should be organised as follows:

- Your own profile page, with your interests and professional skills

- The INF6027 project page (either within your profile page or linked from your profile), where you present:

- a brief introduction (3-4 lines),

- your research questions,

- key findings

- The R code

- Instructions for downloading and running the code

- The INF4000 project page (this will be detailed in the INF4000 coursework brief)

Please note: The Git Hub pages, including the code must not be changed after the submission deadline. Changes past the deadline will be checked on Git Hub history, and lateness penalties will be applied as usual to the mark if changes past the deadline have been made. The penalties are detailed in the section ‘ Information School Coursework Submission Requirements’ below.

The minimum requirement to pass is to perform. at least one type of data analysis (e.g., clustering, prediction, time-series analysis, etc.) and include at least two visualisations (e.g., charts, maps, etc.) that communicate the findings of your data science activity in the report. To obtain a higher mark and more effectively communicate your findings, you may decide to use more than one dataset or present more than one type of data analysis and/or use multiple visualisations and/or use multiple strategies for analysis. Again, you should also engage as much as possible with the appropriate literature.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代做CSCI-UA.0202-001 Operating Systems Homework 1代写C/C++语言

【下一篇】：代做CSCI-UA.0202-001 Operating Systems Homework 1代写C/C++语言

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Web作业Web作业

代做INF6027 – Introduction to Data Science调试数据库编程

日期：2025-02-10 11:16

相关文章