Question 1
A health company is interested in the relationship between people’s health and their movements. This company has conducted a study called “HealthTracker”, in which the participants wore
smartwatches that collected sensor data to find their location over several days. The company has asked you to analyse their data, and your task is to write Python functions to
process the data. Firstly, the data requires some pre-processing to remove unreliable
measurements. Select the file called program.py; in this file you need to write a function
called preprocessing(raw_filename, output_filename) that has two file parameters: the CSV input file raw_filename for the participant data file. the CSV output file output_filename that contains the processed data, with the unreliable
measurements removed. The raw_filename CSV file contains data values in the form of Python strings. The data is in the
following columns: row_id: a unique ID string for each row of data; timestamp: the date and time when the data was captured from the smartwatch. You mayassume the timestamps are all in the form of Day/Month/Year Hour:Minute, using a 24
hour clock. participant_ID: a unique ID string for each participant; double_latitude: the latitude location of the data, in degrees; double_longitude: the longitude location, in degrees; double_altitude: the altitude above sea level, in meters; provider: the information provider: this can be either GPS or network. accuracy: a numerical measure of how close the measured location is to participant's
actual location; with smaller values indicating more precise location data. You can look at the file example_raw.csv to see some examples of these data values. Each rowcontains one set of measurements for one participant. The data will contain several rows of
measurements for each participant, and you can assume that all data for each participant appears inconsecutive rows, and the timestamps for these rows are in chronological order. Finally, you mayassume the data does not contain any duplicate measurements, i.e. the
pairs (participant_id, timestamp) are different for all rows. Your program preprocessing(raw_filename, output_filename) should do all of the following. The output_filename CSV file should have the same header columns, in the same order. This CSVfile should contain data values taken from raw_filename, but with rows of data removed if they
are not valid. Valid rows must satisfy all of the following requirements: the timestamp data must be from 2023; the accuracy measure is 30 or less;each altitude measurement must be in the range zero to 1000. the provider must be either gps or network. Additionally, if after removing all these rows of data, there are fewer than three rows of data for
any participant, then all the data from that participant should be removed. If there are no valid
rows for any of the participants, then the output CSV file should contain just the column headers. For this question, you may choose for yourself whether you wish to read the CSV data into Pythonas a 2D-list or as a list of dictionaries. See Chapter 9 of the lecture notes for standard code to do
both of these options. Example
To test your code, uncomment the line # preprocessing("example_raw.csv", "my_output.csv") andpress Run. Alternatively, you may call the
function preprocessing("example_raw.csv", "my_output.csv") at the Terminal. After running your code, your output data will be visible in the file "my_output.csv". Also, the file
"example_output.csv" will show the correct output data (so if these two files have all the same
rows and columns, then your code has done its job!)
Question 2
The Health Product company would like to know the distances travelled by each of the
participants during the study period. This is estimated by calculated by computing the distances
between the location measurements for each participant. Select the file called program.py; in this file you need to write a function
called calculate_distances(clean_file, output_file) which has two file parameters:
the CSV input file clean_file that has been cleaned according to the requirements of Question1. Hence you may assume this CSV file has the same column headings as the input files of
Question 1, and all of its data conforms to the cleaning requirements described there. In
particular, there will be at least three rows of data for each participant. the CSV output file output_file that contains the analysed data, that includes the distance
measurements. For this question, you MUST read the CSV data in clean_file into Python as a list of
dictionaries. All your data processing operations must access the data through the
dictionairy keys. Your program calculate_distances(clean_file, output_file) should do all of the following. The output_file CSV file should contain all the same headings and rows as clean_file, except that the accuracy column should be removed, and
the row ID should be a positive integer corresponding to the row number of the CSVfile, with row_ID = 1 corresponding to the first row of data (which is in the second rowof the
CSV file, since the first row contains the header names.) there should be an extra column added, after the provider column, called travelled_distance. The values for this column should calculated as follows:
o For the first row of data for each participant, the value should be zero. o For each row after the first, the value should be the haversine distance betweenthe participant's location in the previous row, and their location in the current row. This value should be rounded to three decimal places. To calculate the haversine distance between two locations using latitude and longitude data, you
can call the function haversine_distance(lat1, lon1, lat2, lon2). This function takes four
arguments: lat1 and lon1 are the latitude and longitude measurements in the previous row, while lat2 and lon2 are the latitude and longitude of the current row. If the clean_file file contains only the header row (and no data rows), the output_file file should
just contain the column headers (excluding accuracy and including travelled_distance). Example
To test your code, uncomment the line # calculate_distance("example_clean_data.csv", "my_output.csv") and press Run. Alternatively, you may call the
function calculate_distance("example_clean_data.csv", "my_output.csv") in Terminal. Compare
your output with the file example_final_output.csv.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。