#### 联系方式

• QQ：99515681
• 邮箱：99515681@qq.com
• 工作时间：8:00-23:00
• 微信：codinghelp

#### 您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

###### 日期：2019-12-01 09:21

Assignment 4 – Module 3

1. Instructions

This assignment is worth a total of 9 points toward your final grade. It will consist of two sections. In

Section 1, you will work with trade input and output data and learn how to manipulate them. In

Section 2, you will learn cluster analysis and work with some health data.

1) Course Materials – Jim has made an R textbook available on canvas. Go to “Library Online

Course Reserves” and you will find an e-book “R: predictive analysis: master the art of

predictive modeling” made available from 2019-10-01 to 2019-12-23. Before beginning this

assignment please spend some time reading the relevant sections of the textbook, especially

Chapters 1.3 (visualization methods) and 2.3 (cluster analysis). Students taking the data

analytics module next semester may want to study the book more during their holiday break.

2) Submission of assignment – you will be given two ways to submit your assignment:

a. RMarkdown format: you can submit your assignment as an RMarkdown file (.RMD).

Make sure to describe clearly in the file the steps for you code including explanation on

why you used a certain code / function.

i. The advantage that RMarkdown has over Word is you do not need to worry

script file and the package will help you knit everything into a html document.

Outputs and graphs will also automatically generate under your code boxes

when you run them. However, you will still need to know the syntax for creating

code boxes. All codes must be bounded by the following symbols:

```{r}

```

ii. The following YouTube video teaches some basics of using RMarkdown. Take

time to watch the video and decide if you want to use RMarkdown after.

b. PDF format: you can also submit your assignment in PDF format. Copy and paste

following format.

Text explanations should be in black against a white background.

Codes / scripts should be shown in black letters in a grey box like this.

This will enable us to more easily differentiate between codes and explanations. Always provide explanations for your codes.

For visual outputs (graphs, screenshots, etc.), you can try using UBC’s free Snagit

screen capture program. In the leftmost column of your Canvas account, click on “Help”

>>> “Software Distribution”. Choose the “Snagit” application, add it to your cart and

3) Assignment due date – this assignment will be due at 11.59am on December 2nd

2019.

4) If you have any questions with regards to the assignment, you can contact either Hamzeh or

Wei Siang. Their emails and office hours are as follow:

a. Hamzeh – seh793@mail.usask.ca, Mondays & Wednesdays 10.30am-12 Noon at

MCML154.

b. Wei Siang – weisiang.chan@gmail.com, Tuesdays 10.30am-12 Noon at MCML154.

5) If you face problems with your code, send an email to Hamzeh. Include in your email:

b. The error message shown in your console; and

c. Indicate the line at which the problem appeared (if possible).

a. On the FRE 501 home page, click on “Canvas module” under “Module 3 (Hamzeh)”

b. Scroll down to “Data” and you’ll see the files you’ll need to download for this

assignment.

c. Find the file titled “wiot_stats_sep12.zip”

may take some time.

e. Unzip the file. Doubleclick on the zip file and click “extract all”. A new folder will be

created with the unzipped files.

f. open the file in R (DO NOT use Excel as this will hang the program). Open RStudio and

select “file”, “import dataset”, and “from Stata…”.

g. A new window will pop up. Browse the unzipped folder and select the Stata file titled

“woit_full”.

h. Cancel the data preview (or your computer will take a very long time to load the data).

2. Section 1 – Working with Trade Input / Output Data

With the United States–China Relations Act of 2000, China was allowed to join WTO in 2001. Bill

Clinton the president of USA in 2000 put too much effort to convince the U.S Congress to approve the

trade agreement between the U.S and China. Clinton believed higher levels of trade with China was in

the favour of U.S economy. However, in general American authorities argue that China hinders open

trade and does not open its market to the U.S as the U.S does.

Food Industry in both countries. Please use package “tidyverse” to conduct your analyses.

? POINTS:

? Question 1-1: 3/100

? Question 1-2: 2/100

? Question 1-3: 5/100

? Question 1-4: 10/100

? Question 1-5: 30/100

1-1. Use WIOT dataset to make two subsample of WIOT. In the first subsample we are looking for the

contribution of the U.S agricultural sector (row_item=1 and 64) in the value added of China’s food

industry (col_item=3). The second subsample includes the contribution of the China’s agricultural

sector (row_item=1 and 64) in the value added of U.S food industry (col_item=3). (consult slides 18 to

23 at the GVC_RCA lecture notes)

1-2. Calculate the share of agriculture industry in the value added of food industry for each subsample

you made (consult slides 24 at the GVC_RCA lecture notes).

1-3. Make two graphs showing the changes in the share agricultural industry in the value added of food

industry from 1996 to 2010 for each subsample made (consult slides 26 to 33 at the GVC_RCA lecture

notes). Use package gridExtra to combine the graphs

1-4. In a short paragraph explain whether the U.S authorities’ claims seems to be true and WTO needs

to conduct an investigation or it is a wrong statement. In specific focus on the trends of both graphs

before and after 2001 when China joined WTO.

1-5. Find the share of Chias’ agricultural industry in the total output values of agricultural industry and

food industry of all countries from 1995 till 2011. (HINT-1. use group_by and summarise functions.

HINT 2: group by several variables). Plot your findings where the Y axis is % share of agricultural

industry of China in the total output value of agricultural industry and food industry of all countries

and X axis is the year.

3. Section 2 – Cluster Analysis of Health Data

There is a variable in the cluster_data dataset called inc_hh. This variable is a categorical

variable ranging from 1 to 8. It shows the household income level for each individual. If

inc_hh=1 it means the annual household income of the individual in the dataset is between \$0

to \$19,999; consequently inc_hh=7 means the annual income level of the individual is

between \$120,000 to \$139,999. The final income level (inc_hh=8) is related to those

Canadians whose annual household income is equal or greater than \$140,000.

In the class, we found the dietary patterns of all Canadian adults in the dataset. The questions

? Points

? Question 2-1 : 15/100

? Question 2-2: 10/100

? Question 2-3: 10/100

? Question 2-4: 15/100

1- Please use kmean cluster analysis to identify the dietary patterns of those individuals with

the lowest income level (i.e. inc_hh=1) and income level of between \$120,000 to \$139,999

(i.e. inc_hh==7). Report the average intakes of 9 food groups (using the food dataset we used

in the class) across these two income groups (1 and 7). (use dyplr package for data

management, fviz_nbclust and NbClust to find the optimal number of clusters, kmeans

function to conduct kmean cluster analysis. Please consult slide 58 to 66 of cluster_analysis

lecture notes)

2- In the main dataset we have two variables called bmi_total and nrf. The first variable

indicate the body mass index (BMI) of each individual and the second variable indicate the

diet quality score of each individual based on Nutrient Rich Food index. Please find the

average BMI and NRF across clusters identified for each income groups separately using

group_by and summaries functions. (please consult slide 77 of cluster_analysis lecture notes)

3- Compare the frequencies of those Canadians who have High Quality diet across two income

groups using freq function. Also “descr” package to report the prevalence of males with high

quality diet in each income groups (please consult slide 67 to 76 of cluster_analysis lecture

notes).

4. People in the lowest income groups tend to be more obese than those in the highest income

groups. Adam believes because healthier food options are more expensive, poor people tend to

eat more of unhealthy foods therefore, they are likely to be more obese. However, Bill argues

that because of the technological advancements in agricultural sector, foods are available for

most of the people in developed countries in relatively low prices. So, we cannot blame lower

prices of unhealthy foods for higher prevalence of obesity among poor people. Using your

answers to questions 1 and 3 in a short paragraph discuss whether you support Adam or Bill?