联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-01-27 06:14

1  Data Analytics Task - Climate Data Analysis using Python


1.1    General Overview

The assignment comprises code writing and data analysis. You are allowed to discuss ideas with peers, but your code and experiments and report must be done solely based on your own work.

The assignment leverages elements covered in class. You will be working with a couple of meteorological datasets, you will be required to crunch data, to clean the datasets and present correlations. Specifically, there will be three tasks you will be asked to solve.

The goals of the assignment are the following:

•   To further develop your programming skills

•   To further develop your skills and understanding principles of data analytics and machine learning

•   To acquire experience in dealing with real-world data


1.2    Assignment description

You will find two pickle files named weather-denmark-resampled.pkl and df_perth.pkl, respectively and you will be asked to solve three different tasks.  For TASK1 and TASK2 (covering the main aspects ofpreliminary data analysis, missing data and outlier detection), you will use the first dataset. For TASK 3, (covering correlation and pattern inferring) you will be using the second smaller dataset in order to find correlations and infer patterns.

Read carefully the three tasks description and address them using the pre-compiled Jupyter notebook named Coursework_weather_data.ipynb.


TASK 1 - PRELIMINARY ANALYSIS

In this first task, you will explore the dataset. Follow the instructions in the following:


a.    Import the weather-denmark-resampled.pkl dataset provided in the folder and explore the dataset by answering the following questions.

i.    How many cities are there in the dataset?

ii.    How many observations and features are there in this dataset?

iii.    What are the names of the different features?

b.    Now that you got confident with the dataset, evaluate if the dataset contains any missing values? If so, then remove them using the pandas built-in function.


c.    Extract  the  general  statistical  properties  summarising  the  minimum,  maximum, median, mean and standard deviation values for all the features in the dataset.  Spot any  anomalies  in these properties  and  clearly  explain why you  classify them  as anomalies.



TASK 2 – OUTLIERS

The second task is focused on spotting and overcoming outliers. Follow the instructions in the following:

d.   Store the temperature measurements in May 2006 for the city of Odense. Then produce a simple plot of the temperature versus time.

HINT: In this dataset, the cities are vertically stacked. Therefore, we have a multi- column dataset, which basically works as a nested dictionary.

e.   Find the outliers in this set of measurements (if any) and replace them using your own choice of interpolation.


TASK 3 – CORRELATION

In this last task, you will be seeking correlation between features of the data. For this task, you will be working with a smaller dataset. Follow the instructions in the following:


CORRELATION


f.   We now take a new dataset (df_perth.pkl), which collects climate data of a city in Australia. Here we have just one year of measurements, but more features.


g.   Find any significant correlations between features.

HINT: you might find it useful to look for trends and recurrent patterns within the data.

h.   We now focus on the correlation between precipitation and cloud cover. We want to infer the probability of having moderate to heavy rain (> 1 mm/h) as a function of the cloud cover index.

HINT: you mightfind it useful to create a new column where you have 0 ifprecipitation < 1 mm/h and 1 otherwise.


1.3 Deliverable [Data Analysis Report]

The report should be written in the form of an academic paper using the ICML format1 . The report should be at most 8 pages long excluding references and appendices. The report must include the following sections:

●   Abstract. This section should be a short paragraph (4-5 sentences) that provides a brief overview of the methodology and results presented in the report.

●    Preliminary Analysis. This section describes your study carried out during task 1 and should be organized into the following subsections:

•   Data Understanding. This subsection should detail the data that was used for this study, clearly describing the content, size and format of the data, how many cities are described in the dataset, how many observations and how many (and which) features are considered . Further information can be provided.

•   Data   Cleaning.   This   subsection   should   describe   the   missing   data processing. It is important to describe the methodology that you used in searching for the missing data and how did you address them in the best way (for example how do you ensure that the dataset preserver the same statistics/properties). Motivate clearly your answers.

•   Data Statistics.  This  subsection  should  describe  the  general statistical properties of the dataset with numerical or graphical visualization. Provide reflections toward anomalies (with clear motivation/supporting evidence for anomalies)

●    Outliers. This section should describe all the steps that were applied to the data


1  https://icml.cc/Conferences/2020/StyleAuthorInstructions


to find and tackle outlier pre-processing. A justification for each step should also be provided. In case no or very little pre-processing was done, this section should clearly justify why.

●    Data   Correlation:   This   subsection   should   describe   the   different   features correlations that you have investigated in the current dataset. Even if you discover little   patterns,   it   is   important   that   you   clearly   explain   and   justify   the methodologies that you  adopted.  Clearly  show  results that  can  support your statements.

●    Conclusion. This last section summarises the findings, highlights any challenges or limitations that were encountered during the study and provides directions for potential improvements.


Please  make  sure  you  complement  your  discussion  in  each  section  with  relevant equations, diagrams, or figures as you see fit. Most importantly, be sure that all your answers and solutions are well motivated.

Marking Criteria

See the following page for the marking criteria


CriteriaMark Weight

Abstract/   ConclusionsThe purpose of the executive summary is to outline data analytics project, input, envisioned outputs as well as key findings

10%

Task 1 -

Preliminary

AnalysisDataset Understanding. Provide a clear description of the dataset answering the following questions: i) How many cities are there in the dataset? ii) How many observations and features are there in this dataset? iii) What are the names of the different features?

10%

Data Cleaning – Missing data. Provide a clear description of the results from your missing data analysis and key outcomes.

10%

Data  Statistics.  Describe the  general  statistical  properties  of the  dataset with   numerical   or   graphical   visualization.   Provide   reflections   toward anomalies (with clear motivation/supporting evidence for anomalies)

10%

Task 2 –

Outliers

Show the visualization of the temperature measurements, together with some comments on the behaviour depicted in the plots. Provide  summaries on the outliers – in terms of number of outliers detected as well as techniques adopted to replace outliers (motivate your answers).                                                                  

25%

Task 3 –

InferenceData Correlation. Comment on the significant correlation you found between features and assess rain probability as a function of cloud cover index. Support the  text  with  visualization  of results  and  key  insights  on  the  considered approach.

25%

Report StyleReport  needs  to  be  with  a  clean  and  clear  structure  as  well  as  layout. Quality of images, table, citations and references will be also taken into account.10%


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp