联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2018-11-13 10:18

STA 483/583 Semester Project

Part 2 – Performing a historical analysis of a time series

Due: November 19, 2018

In this project we will be exploring some atmospheric environmental data from Madrid, Spain.

The file madrid_01_17.zip (compressed zip file on canvas site) contains the raw recorded

measurements of hourly air quality measurements from Madrid, Spain from 2001 through 2017 at 24

sites in and around Madrid. The site locations is available in the file stations.csv. A data

dictionary is available in the file dataDescription.pdf.

General Goal (part 2):

Perform a historic assessment to determine if the amount of carbon monoxide and nitrogen dioxide has

changed in time, with the caveat that you should make your assessment only on stations with a

reasonably complete record (more on that below)

Notes & nuances in the data:

- The data is fairly big: 75.6MB (zipped), 3,729,128 rows after properly combining the 17 years of data

- Some of the files have different numbers of columns (e.g., some years there is no NOx measure). It is

probably best to trim each year to relevant variable before combining.

- Because of equipment difference, some measurements are completely missing from some stations.

- The list of stations (with names) is in a separate file from the raw data, you will need to link the two.

- This is a real dataset and, like all real data, occasionally experiences some real data problems.

Specifics:

To successfully complete this part of the project, you will need to:

1. Read in the data successfully

2. Determine which stations have a reasonably complete record of measurements for carbon

monoxide and nitrogen dioxide, to do this, perform the following:

a) Aggregate the carbon monoxide and nitrogen dioxide measurements into year-month

averages for each site in the dataset.

b) Construct plots to explore the historic record for each site determining which sites you feel

are reasonably complete (you make that determination).

c) Justify your decision on which sites you feel have a reasonably complete record. For the

remainder of the analysis, only use data from these sites.

3. Using only the sites you feel are reasonably complete, aggregate the carbon monoxide and

nitrogen dioxide measures into daily averages, year-month averages and yearly averages.

4. Graphically explore each of the three aggregated measures of carbon monoxide and nitrogen

dioxide to determine if you feel the measurements have changed in time.

5. Using the year-month average of carbon monoxide and nitrogen dioxide, build statistical

models that model all systematic components of the time series (possible trends, seasonality,

autocorrelation) and use the model to address the question of whether the measurement of

carbon monoxide or nitrogen dioxide has changed in time. Your chosen model should be as

parsimonious as possible while adequately modeling all aspects of the time series.

6. Make some overall conclusions regarding the air quality levels of carbon monoxide and

nitrogen dioxide in Madrid. Your overall findings must be supported by graphical and/or

numeric summaries that help tell the story.

Some hints:

There are several ways to take monthly aggregates in R:

o the functions group_by() and summarize() in the package dplyr will be handy.

The packages lubridate will be useful for working with timestamps.

Feel free to use other software languages (SAS) if you find them helpful, but your results must

be completely reproducible! You will need to replicate the results on future parts of the project.

The option na.rm=TRUE will be handy in your aggregation if using R.

This analyses does involve model building but also includes graphical and numerical

summaries. Make sure to properly label plots and tables.

Several DataCamp modules will also be posted that may be helpful!

Encouragement:

The underlying idea of this assignment is largely the same as the lab days in October where we fit

deterministic models with autocorrelation, however here you are combining several aspects of what we

covered as well as dealing with date and time stamps. Part of the modern practice of statistics is dealing

with difficult data and getting it into a usable format.

What to turn in:

A short well-written report outlining your exploratory analysis (I suggest using Rmarkdown). There is

no page limit requirements but I would expect at least a 3-5 page report (keep in mind you are

including several and tables in your writeup). Make sure your report addresses all questions and is

well-formatted (information on formatting Rmarkdown documents is also available on the canvas site.

This report is due on Monday, November 19 on canvas. Your report should include all necessary work

but be as brief as possible. Large chunks of source code should (largely) be relegated to an appendix.

Grades:

This is part 2 (of 3) of a semester long project looking at this environmental data. This part will count

as 30% of your total semester project grade, which corresponds to 6% of your total grade in the course

(project is worth 20% total). Undergraduate students are allowed to work in pairs while graduate

students are expected to work on your own!

If you are an undergraduate and working with another student you must tell me your partner before

Friday, November 9, 2018.

In part 3 you will construct a predictive model and undergraduates once again will be allowed to pair

up, however you will NOT be allowed to work with the same student.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp