联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2023-03-28 11:18

ISE 535 Data Mining Homework 4 Submit on March 29 by 12.30 pm

The file FlightDelays.csv consist of all flights from the Washington, DC area into the New York City

area during January 2004. A record is a particular flight. The data were obtained from the Bureau of

Transportation Statistics (available at www.transtats.bts.gov).

The goal is to accurately predict whether or not a new flight (not in this dataset), will be delayed

using as predictors the variables DAY_WEEK, CRS_DEP_TIME, ORIGIN, DEST, CARRIER. The target

(response) variable Flight.Status indicates whether the flight was delayed. It has two classes (delayed

and ontime).

Read the file into a dataframe df selecting the columns DAY_WEEK, CRS_DEP_TIME, ORIGIN, DEST,

CARRIER and Flight.Status only. The variable CRS_DEP_TIME is the scheduled departure time by the

airline.

Transform all variables to factors, then use df$CRS_DEP_TIME = factor(floor(df$CRS_DEP_TIME/100))

to create hourly intervals for the departure times.

a) (10 pts.) Find the number of delayed flights from each ORIGIN airport to each DEST airport. Show

the results in a 3-by-3 table.

b) (20 pts.) Use set.seed(1) to split the data into a train (80%) and a test set (20%). Report the

number of flights for each day of the week in the train set.

c) (10 pts.) Use the train set to construct a Naive Bayes model. Report the output provided by this

model (A-priori probabilities and all Conditional Probabilities).

d) (20 pts.) For the test set, show the confusion matrix, then find the overall test accuracy rate.

e) (20 pts.) It is of interest to know if a new Delta flight from DCA to LGA, scheduled to be de-

parting between 10 a.m. and 11 a.m. on a Sunday (DAY_WEEK = "7"), be ontime or delayed. Use

CRS_DEP_TIME = "10". What is your prediction?

f) (20 pts.) What is the posterior probability that this flight will be ontime?.

Submit your report (code and output) as a pdf file onto Blackboard (no screen captures). Report must

include the student name and USC ID.

1


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp