联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-11-08 08:05

CSCI 4146 - The Process of Data Science - Fall 2020

Assignment 1

The submission must be done through Brightspace.

Due date and time as shown on Brightspace under Assignments.

● To prepare your assignment solution use the assignment template notebook available

on Brightspace.

● The detailed requirements for your writing and code can be found in the evaluation rubric

document on Brightspace.

● Questions will be marked individually with a letter grade. Their weights are shown in

parentheses after the question.

● Assignments can be done by a pair of students, or individually. If the submission is by a

pair of students, only one of the students should submit the assignment on Brightspace.

● We will use plagiarism tools to detect any type of cheating and copying (your code and

PDF).

● Your submission is a single Jupyter notebook and a PDF (With the compiled results

generated by your Jupyter notebook). File names should be:

○ A1-<your_name1>-<your_name2>.ipynb

○ A1-<your_name1>-<your_name2>.pdf

● Forgetting to submit both files results in 0 markings for both students.

In this assignment, you will need to build a model to predict the price of an Airbnb listing.

Link for the dataset https://www.kaggle.com/airbnb/boston

1. Data understanding and preprocessing (0.1)

a. Build the data quality report

b. Identify data quality issues and build the data quality plan

c. Preprocess your data according to the data quality plan

d. Answer the following questions:

i. What is the neighbourhood with the highest average rating?

ii. What are the major characteristics of this neighbourhood (e.g., type of

listing, host rating, etc)?

2. Spatial data (0.2)

a. Plot listings on the city map with different colours corresponding to the listing’s

neighbourhood

b. Mark the “State station” (lat, long = 42.3570174,-71.071191) subway station on

the city map.

c. Plot the distance between the closest and most distant listings to State station.

3. Build a model to forecasts the price of a listing (0.7)

a. Explain what is the task you’re solving (e.g., supervised x unsupervised,

classification x regression x clustering or similarity matching x etc)

b. Use a feature selection method to select the features to build a model. Include in

the resulting dataset the distance from State station and exclude the free-text

(such as descriptions, reviews) and rating features.

c. Select the evaluation metric. Justify your choice.

d. Build a baseline model

i. Perform hyperparameter tuning if applicable.

ii. Tran and evaluate your model

iii. How do you make sure not to overfit?

iv. Plot learning curve

v. Analyze the results

e. Build a candidate final model (can be repeated for multiple models but only

include the final selection)

i. Perform hyperparameter tuning if applicable.

ii. Tran and evaluate your model

iii. How do you make sure not to overfit?

iv. Plot learning curve

v. Analyze the results

f. Compare the two models with a statistical significance test. Use a box-plot to

visualize your comparison.

g. The above question explicitly excludes the rating attributes. It corresponds to

modelling the task of a host putting a new listing on Airbnb that does not have

any ratings yet. A related task is of a traveller who wants to check whether a

listing requests a fair price, including the ratings of that listing. Include the

rating(s) in the dataset and see if the new final model performs better than

without the rating attributes.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp