联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2022-11-28 11:05

TAT 7008 - Assignment 3


Note: A3 is 20% of the overall assessment. The 100 points in A3 will be rescaled to 20% in

the final score.


Web Scraping

1. (25 points) Crawl information from https://www.sciencedirect.com

(1) (13 points) Crawl some key information about all articles published in 2022 from the

website https://www.sciencedirect.com/journal/journal-of-econometrics/issues, including

year, volume, article content, title, authors and pages. Crawl the volume numbers from 226

to 230 only.

(2) (6 points) Remove “\xa0” in volume_name and store the crawled data into pandas

DataFrame.


(3) (6 points) Filter the author with Null value and then find the top 10 authors that published

the most articles.

Hint:

i. Click the button of the targeted item


ii. Pass the html to BeautifulSoup and get all links



iii. Use requests to get article content, title, authors and pages for each block


For this example,

article content Research article

title Identification in nonparametric models for

dynamic treatment effects

authors Sukjin Han

pages Pages 132-147


Scikit-learn

2. (10 points) Handwritten digits dataset loading and preprocessing

(1) (2 points) Load the digits data by load_digits.

(2) (4 points) Use MinMaxScaler to normalize the covariates X.

(3) (4 points) Split the data into training and test set

with test_size=0.2 and random_state=2020.


3. (15 points) Following question 2, fit the model specified below with different hyper-

parameters, and report the performance.

(1) (7 points) Fit the naive bayes model MultinomialNB on the digits training set with

different values of the parameter alpha α∈{1,2,…,20}.

(2) (4 points) Record the accuracy scores on the test set for each α.

(3) (4 points) Draw the line plot of the accuracy scores versus different α.

4. (15 points) Following question 2, apply dimensionality reduction methods applied on the

digits dataset.

(1) (3 points) Fit Principal Component Analysis (PCA, n_components=2) model to Digits

training set for dimension reduction.

(2) (3 points) Apply model from (1) to train/test set for dimensionality reduction, compute

the 2-dimensional embedded train/test set.

(3) (3 points) Fit a nearest neighbor classifier (KNN, n_neighbors=3) on the embedded

training set. Compute the nearest neighbor accuracy on the embedded test set, plot the

projected test set points and show the evaluation score.

(4) (6 points) Use Neighborhood Components Analysis (NCA, n_components=2) for

dimensionality reduction, repeat (1), (2) and (3).

Note: output results in following image format, no need for outputs in (1) and (2)


Computer vision

5. (18 points) Face and Eye Detection

(1) (12 points) Please write down the code to detect the faces and the eyes in face.jpg. Draw

the red rectangle for the faces and the green rectangle for the eyes.

(2) (6 points) If we want to open the front camera for video capturing and performing face

and eye detection. How can we modify the above codes?

Hints: you may use the auxiliary .xml files and the detection algorithm based on Haar-like

features, provided by opencv.


Natural language processing

6. (17 points) Word embedding (Skip-gram)

see the attached jupyter notebook with partially finished code: wb_partial_code.ipynb


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp