STATS 3DA3
Homework Assignment 2
Pratheepa Jeganathan
02/05/2024
Instruction
• Due before 10:00 PM on Tuesday, February 13, 2024.
• Submit a copy of PDF with your solution to Avenue to Learn.
• Late penalty for assignments: 15% will be deducted from assignments each day
after the due date (rounding up).
• Assignments won’t be accepted after 48 hours after the due date.
Assignment Standards
Your assignment must conform to the Assignment Standards listed below.
• Write your name and student number on the title page. We will not grade assignments
without the title page.
• You may discuss homework problems with other students, but you have to prepare the written
assignments yourself.
• LATEXis strongly recommended but not strictly required.
• Eleven-point font (times or similar) must be used with 1.5 line spacing and margins of at
least 1~inch all around.
• Use newpage to write solution for each question (1, 2, 3).
• No screenshots are accepted for any reason.
• The writing and referencing should be appropriate to the undergradaute level.
1
• Various tools, including publicly available internet tools, may be used by the instructor to
check the originality of submitted work.
• Assignment policy on the use of generative AI:
– Students are not permitted to use generative AI in this assignment. In alignment
with McMaster academic integrity policy, it “shall be an offence knowingly to … submit academic work for assessment that was purchased or acquired from another source”.
This includes work created by generative AI tools. Also state in the policy is the following, “Contract Cheating is the act of”outsourcing of student work to third parties”
(Lancaster & Clarke, 2016, p. 639) with or without payment.” Using Generative AI tools
is a form of contract cheating. Charges of academic dishonesty will be brought forward
to the Office of Academic Integrity.
2
Question 1
Download the paper Data Science at the Singularity by David Donoho (2024) at paper. Follow the steps to find the most frequently used words and create a word cloud.
• (1) Reference where you obtained the original PDF document.
• (2) Read all PDF document pages and separate each line by \n.
• (3) Split the lines by \n.
• (4) Remove the lines before Abstract. ...... You can print the first few lines and find
the number of lines to remove.
• (5) Create a data frame with lines.
• (6) Tokenize each line and convert each word to a row.
• (7) Convert each word to lowercase.
• (8) Remove stopwords.
• (9) Remove any other words that are not suitable for the word cloud. For example, a single
letter word, symbols [ . , ) , abbreviation, etc.
• (10) Create a term-frequency data frame.
• (11) Produce a word cloud. You can decide on the most frequently used words in the world
cloud—for example, word cloud for the ten most frequently used words.
• (12) Write a summary paragraph (at least two statements) about your word cloud. The
summary should be cast in the context of your chosen text document.
Question 2
Question 2 uses Johns Hopkins GitHub data on the COVID-19 global vaccine administered to
develop a Shiny App.
Visit the website https://github.com/govex/COVID-19/tree/master/data_tables/vaccine
_data/global_data and read the description (readme.md).
3
This question will lead to developing a Shiny app so that users can choose the date range to
investigate the COVID-19 vaccine administrated and the number of people for whom at least one
dose has been administered.
• (1) Read the CSV file of https://raw .githubusercontent .com/govex/COVID -19/
master/data_tables/vaccine_data/global_data/time_series_covid19_vaccine
_global .csv into Python. Read the data dictionary at https://github .com/
govex / COVID -19 / blob / master / data _tables / vaccine _data / global _data /
data_dictionary.csv.
• (2) Each row is uniquely defined by country and date in the data frame. What is the
dimension of the data?
• (3) Look at the data dictionary. Describe the Doses_admin and People at least one
dose administered variables.
• (4) Identify the data frame column representing the countries. Then, select the rows in the
data frame for Canada.
• (5) Use only the Canada vaccine data to answer the rest of the questions. Plot the time series
data of Dose_dmin and People_at_least_one_dose in the same graph. Label the time
series lines by Doses Administered and People at least one dose administered,
respectively. Convert the y-axis to the log scale. Rotate the x-axis ticks by 45 degrees.
Hint:
1. Convert ‘Date’ column to datetime format.
2. Use matplotlib.pyplot.plot.
• (6) Describe the plot in the context of data.
• (7) Create the Shiny app as follows. In the Shiny app, the user input is any starting and
ending dates. The range of dates may be 2020-12-29 to 2023-03-09. The output is the
time series plot for the logarithm of the doses administrated and people at least
one dose administrated in Canada for the range of dates the users choose. You can
use the following template to create the Shiny app.
4
• (8) Deploy your Shiny app at https://www.shinyapps.io/. Then, provide the link to the
app—for example, https://pratheepaj.shinyapps.io/my_app/.
from shiny import App, render, ui
# import required libraries
app_ui = ui.page_fluid(
ui.input_date_range(
"daterange",
"Date range",
start="2020-12-29",
end= '2023-03-09'
),
ui.output_plot('myplot'),
)
def server(input, output, session):
@output
@render.plot
def myplot():
# Read the data
# select the data for Canada
# If you call the data frame as `df`, then the
# following codes select the rows in the user
# selected date range
df = df[df['Date'] > pd.Timestamp(input.daterange()[0])]
df = df[df['Date'] < pd.Timestamp(input.daterange()[1])]
# Create the plot using `df`
5
app = App(app_ui, server)
3. Helper’s name.
After attempting homework problems individually, students may discuss a homework assignment
with their classmates. However, students must write up their solutions individually and explicitly
indicate who (if anyone) or resources students received help. Write your helper’s name (only one
helper’s name is accepted).
6
Grading scheme
1. 1. Link to the document[1]
2. Codes to read all the pages[1]
3. Codes [1]
4. Codes [1]
5. Codes [1]
6. Codes [2]
7. Codes [1]
8. Codes [1]
9. Codes [1]
10. Codes [1]
11. Codes, word cloud for the most frequently used words [2]
12. Two statements[2]
2. 1. Codes [1]
2. Codes and answer [1]
3. Description [2]
4. Identify the column and code [2]
5. Plot variable 1, plot variable 2 in the same plot, label both time
series, y-axis scale, x-axis ticks [5]
6. At least one statement [1]
7. importing libraries, complete the codes for creating the plot, app
works locally[3]
8. deploying the app, link to the app [2]
The maximum point for this assignment is 32. We will convert this to 100%.
7
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。