联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2020-07-12 10:26

Programming in R - Week 2 Assignment

IPAL - The University of Chicago

Due: Sunday, July 12, 2020 at 11:59pm on Canvas

Structure

This assignment will focus on gathering, analyzing, and plotting real data. The answers here are much more

open ended than those in Problem Set 1, and there may not be an obvious “right” way to do things. Try

your best and record any assumptions or major choices you make as comments. Like before, this problem

set will be broken into three sections, each worth 16 points.

The goal of this assignment is to explore the relationship between temperature and homicides in Chicago.

We will use temperature data from the National Oceanic and Atmospheric Administration (NOAA) and

crime data from the City of Chicago Data Portal. The temperature data is pre-collected, but you will need

to retrieve the crime data yourself via an API.

Start by creating a new project/folder for this assignment. Create a new R script to save your code. For

each chunk of code you create, please preface it with a comment describing what your code is doing.

For example, your answers might look like this:

# Loading in saved homicide data

homicides <- read_csv("homicides.csv")

# Finding the number of crimes for each police district

homicides %>%

group_by(district) %>%

summarize(count = n())

Section 1: Visualizing Weather Data

The provided weather data (ohare_temps.csv) comes from the NOAA weather station at O’Hare Airport.

It includes a timestamp and corresponding temperature (in Farenheit) for each hour since January 1st, 2001.

Using functions from the tidyverse and lubridate packages, start by reading the provided CSV and

converting the timestamp column to a datetime format. Next, extract the year, month, day, week, and hour

from your datetime-formatted column and make them into separate columns, called year, month, day, week,

and hour.

Calculate the average temperature for each year and save your results to a new dataframe. Your results

should look similar to the ones below:

head(temps_avg, n = 4)

## # A tibble: 4 x 2

## year mean_temp

1

## <dbl> <dbl>

## 1 2001 51.3

## 2 2002 51.2

## 3 2003 49.3

## 4 2004 50.4

Answer the following questions using code and comments:

1. What are the top two coldest years in your summarized dataset? Exclude 2020 since there is no

summer and fall data.

2. On average, across all months and years, what hour of the day is the hottest?

3. What day has the largest swing in temperature from 3 AM to 3 PM? (Hint: There are many ways to

calculate this. I suggest filter() and spread() or the lag() function)

4. What week in the dataset had the largest year-to-year absolute change in average temperature? In

other words, comparing the average temperature of weeks across all years, what week had the greatest

change from the same week in the previous year?

Finally, using your summarized dataset, do your best to replicate the following plot. Note: 2020 is excluded.

Chicago's polar vortex year

Temperature °F

Average Temperature by Year in Chicago (2001?2019)

Source: NOAA Weather Station, O'Hare Airport

2

Section 2: Downloading and Summarizing Homicide Data

The City of Chicago keeps a fairly comprehensive database of crimes which can be found here: https:

//data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2. Within this database there are

records of all the homicides committed in Chicago since 2001.

We want to extract only these records, however, our previous method of downloading a CSV and using

filter() to keep only the records we want is unlikely to work because the crimes dataset CSV is multiple

GB in size. Instead, we can use the Data Portal’s API to grab only the records we’re interested in. This can

be accomplished in two ways:

1. Use the RSocrata library and connect to the crimes API. There is example documentation on the city

website.

2. Use the raw API and the read_json() function from the jsonlite library to directly query the API

and read JSON data into R.

Your dataset of all homicides should contain 10,000 rows. Once you’ve successfully read the data into R,

use lubridate functions similar to those you used on the temperature data to extract the year, month, day,

week, and hour columns.

Next, answer the following questions using code and comments:

1. What year had the highest number of homicides in Chicago?

2. What hour, on average, has the most homicides?

3. What community areas had the lowest number of homicides? Use a join and community area data

from the Data Portal to determine the the names of each community area.

Finally, replicate the plot below to the best of your ability:

Homicides Over Time in Chicago

Source: City of Chicago Data Portal

3

Section 3: Combining Both Datasets

Finally, we want to combine aggregated data from both datasets into a single plot. First, find the mean

temperature and mean number of homicides by week across all years. Then, merge your results and replicate

the plot below to the best of your ability.

Week

Average # of Homicides

Average Temp

Type

Homicides

Temp

Homicides vs Temperature in Chicago (2001 ? 2019) What is potentially wrong with this plot? Is there a way we could improve it? What might explain the

phenomenon that it shows? Answer in a comment.

4


版权所有:编程辅导网 2018 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。