联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-05-11 11:12

ECON4016 - FINAL EXAM

The final exam consists 4 small projects. You can choose 2 of them to finish and send me

your report. For each of the small projects you choose, you should perform data analysis

using the data I provide to you and the techniques we discussed in class. For each project,

you should tell me in your report what kinds of questions you were trying to answer with

those analysis, and what did you do and what finding to you have. Please send your report

and R-script to my email (colonct@gmail.com) by 9AM 18 May, 2019. NO late submission

after will be considered.

1. Please download the subsample data of Hong Kong census (2001-2016) from the following

link:

https://drive.google.com/file/d/1Md6c5J0VcV0_g_veL48i9upJNOKoLc-V/view?usp=

sharing

The zipped file contains four data files: hkcensus2001025.dta, hkcensus2006025.dta,

hkcensus2011025.dta, hkcensus2016025.dta which are the subsample data for HK census

in 2001, 2006, 2011 and 2016 respectively. You could use the following code to read

these dta files into R

library(foreign)

mydata <- read.dta(”c:/mydatga.dta”)

Select two to three variables that interested you from these dataset, try to demonstrate

the relationship between these variables using visualisation. You can also extend your

analysis by showing the temporal changes and/or spatial distribution of these variables.

Examples for the research questions are:

1) How gender inequality in employment changes by the rising education level of

women?

2) How poor households distribute spatially in different districts of Hong Kong,

and how does this spatial pattern change over time?

These are only examples, feel free to choose other research questions that interests you.

2. Please download the data for textual data analysis from the following link:

https://drive.google.com/file/d/1vmqN5wsUYvAq0yzdpHud32jbB4EKndTD/view?usp=

sharing

It contains two datasets, both in .csv format:

1) historical news headlines from Reddit WorldNews Channel which collected

the top 25 headlines in each date based on reddit users’ votes (RedditNews.csv

contains two columns: the first column is the ”date”, and second column is the

”news headlines”. All news are ranked from top to bottom based on how hot they

are)

1

2) Dow Jones Industrial Average (DJIA) roughly between 2009 and 2016.

And please use the first dataset to generate some useful indices or variables to summarise

the information in those texts and see if these indices or variables could have

some predictive power for the stock price in the second dataset. (Hint: you can either

use simple regression or more complex machine learning methods to test for the

relationship.)

3. Please download the U.S. patent dataset for network analysis from the following link:

https://drive.google.com/file/d/1qytpbWCkyZNYG4GGHdo-P7OxZjtTYYBV/view?usp=

sharing

It contains two datasets, both in .txt format:

1) acite75 99.txt: all US patent citations for utility patents granted between 1975

and 1999 (the edge file)

2) apat63 99.txt: all utility patents information (the node file)

You can find the data documentation files Cite75 99.txt and pat63 99.txt containing

the detail description of all variables inside.

And please use these dataset to create a citation network for the U.S. patents. Try to

visualise and describe the characteristics of this network and try to find some useful

information from these analysis (e.g. which was the key innovations in this patent

dataset).

4. Please download the data of real estate transactions for building a predictive model

from the following link:

https://drive.google.com/file/d/1T6e6-iy15A9OQZyjsbzWDNkiTiOlHHrW/view?usp=

sharing

The link connect to a guangzhou2017.dta file which contains all the real estate transactions

in Guangzhou at 2017. You can use the same code in the first small project

to read this file into R. Please use the apartment characteristics information in this

dataset to build a model for predicting house price using the tree based or neutral

network method.

2


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp