Points available: 170

This assignment uses glm() (logistic regression) and ctree(), each for 2 category

classification. Data file “churn_mod.txt” is made available through

UTDbox>BUAN6356_2020Summer>data.

You will need the “partykit” package for this assignment. The “data.table”

package is suggested but not required. Do not use any other “require()” or

“library()” statement in your code. Use of the install.packages() command in the

code you submit will result in a score of zero.

If multiple submit instances for a single student are waiting to be graded, only the

last (most recent) will be run and graded.

The first commands of your submitted code MUST be:

setwd(“c:/data/BUAN6356/HW_5”); source(“prep.txt”, echo=T)

and the last command of your submitted code MUST be:

source(“validate.txt”, echo=T)

Be careful with the quote characters as they must ALL be the same at the

beginning and end of a string.(Use the single or double quote character from the

key next to “Enter”.) Inclusion of these lines is required BEFORE your code will

be tested.

Submit the code to eLearning as an ASCII file which can be copied directly into R.

You may submit this assignment as many times as needed until you get full credit.

At that point you should stop since only the last score counts.

Background: Your task in this assignment is to explore 3 strategies for use in 2

category classification: logistic regression via glm() using AIC minimization,

logistic regression via glm() using predictor elimination through Coefficient

Estimate t-values, and ctree() from “partykit”. You will want to assess the final

models in each strategy with a single 10% testing sample from the original

“churn_mod” dataset. You will use 379546790 as the RNG seed. Your objective

is to classify customers by “churn” (canceled subscription, labeled 1, or continued

subscription, labeled 0) and to be ready to assess these 3 strategies through the

Expected Bayes Risk associated with each strategy together with their individual

overall Accuracy. You should be ready to perform these assessments for both the

training and testing data. The variable “ID” should be excluded from the analysis.

Deliverables (all names case as shown) :

1. seed (vector) Random Number Seed value

2. tstPc (vector) Proportion of sample in testing set (single value)

3. raw (data.frame) data as read

4. wk (data.frame) data prepared for analysis

5. nTst (vector) Number of obs in the testing set (single value).

6. tstIdx (vector) Index values for testing set obs

7. m0 NULL model (logistic regression) for training set (“em-zero”)

8. m1 Baseline logistic regression model for training set (“em-one”)

9. m1s AIC minimization strategy final model

10.m1t t-statistic strategy final model.

11.pred1s Predicted values from test set using final AIC model.

12.pred1t Predicted values from test set using t-statistic final model.

13.class1s Classification results from test set using final AIC model.

14.class1t Classification results from test set using t-statistic final model.

15.tree ctree() model for training data

16.predCt Predicted values for test set using ctree() model.

17.classCt Classifiation results from test set using ctree() model.

版权所有：编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。