联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2024-02-10 10:22

STATS 3DA3

Homework Assignment 2

Pratheepa Jeganathan

02/05/2024

Instruction

• Due before 10:00 PM on Tuesday, February 13, 2024.

• Submit a copy of PDF with your solution to Avenue to Learn.

• Late penalty for assignments: 15% will be deducted from assignments each day

after the due date (rounding up).

• Assignments won’t be accepted after 48 hours after the due date.

Assignment Standards

Your assignment must conform to the Assignment Standards listed below.

• Write your name and student number on the title page. We will not grade assignments

without the title page.

• You may discuss homework problems with other students, but you have to prepare the written

assignments yourself.

• LATEXis strongly recommended but not strictly required.

• Eleven-point font (times or similar) must be used with 1.5 line spacing and margins of at

least 1~inch all around.

• Use newpage to write solution for each question (1, 2, 3).

• No screenshots are accepted for any reason.

• The writing and referencing should be appropriate to the undergradaute level.

1

• Various tools, including publicly available internet tools, may be used by the instructor to

check the originality of submitted work.

• Assignment policy on the use of generative AI:

– Students are not permitted to use generative AI in this assignment. In alignment

with McMaster academic integrity policy, it “shall be an offence knowingly to … submit academic work for assessment that was purchased or acquired from another source”.

This includes work created by generative AI tools. Also state in the policy is the following, “Contract Cheating is the act of”outsourcing of student work to third parties”

(Lancaster & Clarke, 2016, p. 639) with or without payment.” Using Generative AI tools

is a form of contract cheating. Charges of academic dishonesty will be brought forward

to the Office of Academic Integrity.

2

Question 1

Download the paper Data Science at the Singularity by David Donoho (2024) at paper. Follow the steps to find the most frequently used words and create a word cloud.

• (1) Reference where you obtained the original PDF document.

• (2) Read all PDF document pages and separate each line by \n.

• (3) Split the lines by \n.

• (4) Remove the lines before Abstract. ...... You can print the first few lines and find

the number of lines to remove.

• (5) Create a data frame with lines.

• (6) Tokenize each line and convert each word to a row.

• (7) Convert each word to lowercase.

• (8) Remove stopwords.

• (9) Remove any other words that are not suitable for the word cloud. For example, a single

letter word, symbols [ . , ) , abbreviation, etc.

• (10) Create a term-frequency data frame.

• (11) Produce a word cloud. You can decide on the most frequently used words in the world

cloud—for example, word cloud for the ten most frequently used words.

• (12) Write a summary paragraph (at least two statements) about your word cloud. The

summary should be cast in the context of your chosen text document.

Question 2

Question 2 uses Johns Hopkins GitHub data on the COVID-19 global vaccine administered to

develop a Shiny App.

Visit the website https://github.com/govex/COVID-19/tree/master/data_tables/vaccine

_data/global_data and read the description (readme.md).

3

This question will lead to developing a Shiny app so that users can choose the date range to

investigate the COVID-19 vaccine administrated and the number of people for whom at least one

dose has been administered.

• (1) Read the CSV file of https://raw .githubusercontent .com/govex/COVID -19/

master/data_tables/vaccine_data/global_data/time_series_covid19_vaccine

_global .csv into Python. Read the data dictionary at https://github .com/

govex / COVID -19 / blob / master / data _tables / vaccine _data / global _data /

data_dictionary.csv.

• (2) Each row is uniquely defined by country and date in the data frame. What is the

dimension of the data?

• (3) Look at the data dictionary. Describe the Doses_admin and People at least one

dose administered variables.

• (4) Identify the data frame column representing the countries. Then, select the rows in the

data frame for Canada.

• (5) Use only the Canada vaccine data to answer the rest of the questions. Plot the time series

data of Dose_dmin and People_at_least_one_dose in the same graph. Label the time

series lines by Doses Administered and People at least one dose administered,

respectively. Convert the y-axis to the log scale. Rotate the x-axis ticks by 45 degrees.

Hint:

1. Convert ‘Date’ column to datetime format.

2. Use matplotlib.pyplot.plot.

• (6) Describe the plot in the context of data.

• (7) Create the Shiny app as follows. In the Shiny app, the user input is any starting and

ending dates. The range of dates may be 2020-12-29 to 2023-03-09. The output is the

time series plot for the logarithm of the doses administrated and people at least

one dose administrated in Canada for the range of dates the users choose. You can

use the following template to create the Shiny app.

4

• (8) Deploy your Shiny app at https://www.shinyapps.io/. Then, provide the link to the

app—for example, https://pratheepaj.shinyapps.io/my_app/.

from shiny import App, render, ui

# import required libraries

app_ui = ui.page_fluid(

ui.input_date_range(

"daterange",

"Date range",

start="2020-12-29",

end= '2023-03-09'

),

ui.output_plot('myplot'),

)

def server(input, output, session):

@output

@render.plot

def myplot():

# Read the data

# select the data for Canada

# If you call the data frame as `df`, then the

# following codes select the rows in the user

# selected date range

df = df[df['Date'] > pd.Timestamp(input.daterange()[0])]

df = df[df['Date'] < pd.Timestamp(input.daterange()[1])]

# Create the plot using `df`

5

app = App(app_ui, server)

3. Helper’s name.

After attempting homework problems individually, students may discuss a homework assignment

with their classmates. However, students must write up their solutions individually and explicitly

indicate who (if anyone) or resources students received help. Write your helper’s name (only one

helper’s name is accepted).

6

Grading scheme

1. 1. Link to the document[1]

2. Codes to read all the pages[1]

3. Codes [1]

4. Codes [1]

5. Codes [1]

6. Codes [2]

7. Codes [1]

8. Codes [1]

9. Codes [1]

10. Codes [1]

11. Codes, word cloud for the most frequently used words [2]

12. Two statements[2]

2. 1. Codes [1]

2. Codes and answer [1]

3. Description [2]

4. Identify the column and code [2]

5. Plot variable 1, plot variable 2 in the same plot, label both time

series, y-axis scale, x-axis ticks [5]

6. At least one statement [1]

7. importing libraries, complete the codes for creating the plot, app

works locally[3]

8. deploying the app, link to the app [2]

The maximum point for this assignment is 32. We will convert this to 100%.

7


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp