联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2019-10-21 10:41

Big Data Methods

PC session 3

Part I: Empirical Part

Use the dataset “Oilfinance” for the exercises 1-21.

Ridge and lasso regression for prediction of continuous outcome

1) Define the first variable (i.e. first column) in the data matrix to be the outcome y (price

change of RTS index in % compared to one week ago) and the remaining variables to be the

predictors x (lagged levels and price changes of oil supply, stocks, indices). Show the

distribution of y by means of histogram.

2) Define a training sample containing 188 observations. Apply k-fold cross-validation (k=10) in

the training data to find optimal lambda for ridge regression (alpha=0). Report the optimal

lambda for the ridge regression.

3) In a next step, run a ridge regression (alpha=0) in the training data with the optimal lambda

and show the coefficients.

4) Predict the outcome in the test data using the optimal lambda.

5) Compute the mean squared error, the mean absolute error, and also compute the average of

the absolute y in test data (to compare it to the errors).

6) Run a ridge regression (alpha=0) in the training data with a user-provided penalty of

lambda=10 and show the coefficients.

7) Predict the outcome in the test data and compute the mean squared error.

8) Run a lasso regression (alpha=1) with the same data. Apply k-fold cross-validation (k=10) in

the training data to find the optimal lambda. Report the coefficients which are different to

zero.

9) Predict the outcome in test data using the optimal lambda.

10) Compute the mean squared error and the mean absolute error.

11) Predict the expected price change in % for mean values of x in the data.

2019 Selina Gangl

2

Ridge and lasso regression for prediction of binary outcome

12) Create a binary variable for y>0 (meaning that RTSindex price change is larger than zero).

13) Run a lasso logit regression for binary outcomes (setting family to binomial). Apply k-fold

cross-validation (k=10) in the training data to find optimal the lambda. Report the optimal

lambda.

14) Run a lasso regression (alpha=1) in the training data with the optimal lambda. Report the

coefficients that are different to zero.

15) Predict the outcome in the test data using the optimal lambda.

16) Recode the predicted outcome to be one if the predicted probability is larger than 50%

(=0.5). Compare this variable to the true outcomes in the test data in order to calculate the

classification error rate and share of correct classifications.

Causal inference for one regressor based on double lasso without sample splitting

17) Define "brentyl1" (price/barrel of crude brent oil in last period, i.e. one week ago) to be the

regressor d whose causal effect on y is of interest. Define the remaining regressors to be

used as potential controls x for causal analysis when estimating the effect of d on y.

18) Run a LASSO with double selection of x in the treatment and outcome equations to estimate

the causal effect of d on y. The effect of d is assumed to be homogeneous (does not depend

on values of x or d). Report the output.

19) Re-run the command with "partialling out" rather than "double selection". Report the

output.

Causal inference for one regressor based on double lasso with sample splitting

20) Apply the partialling out method with sample splitting. Use the training sample to estimate a

lasso-based model for y as a function of x and of d as a function of x based on crossvalidation.

Then estimate effect of d on y in test data. Swap the roles of the test and training

data and estimate the effect of d on y as the average of the effects in either subsamples.

Furthermore, compute the standard error of the estimated effect.

Causal inference for several regressors based on double lasso without sample splitting

21) Use the command “rlassoEffects” to estimate the causal inference for several regressors

based on double lasso without sample splitting .

2019 Selina Gangl

3

Lasso-based causal inference with instruments without sample splitting

Use the data “EminentDomain” from the “hdm” package. This is a dataset on judicial eminent

domain decisions and contains four sub-data sets, which differ mainly in the dependent

variables. Use the data about the non-metro (NM) area (in log).

Outcome variable (y) log house price in non-metro area of circuit

(=district)

Causal variable (d) number of pro-plaintiff appellate takings

decisions overturning government's seizure

of property in favor of private owner

(indicator for protection of individual

property rights)

Instruments (z) characteristics of randomly assigned judges

including gender, race, religion, political

affiliation,...

Control variables (x)

Define the outcome variable (y), the causal variable (d), the instruments (z), and the control

variables (x).

22) Run LASSO IV estimation for the selection of controls x and instruments z.

23) Run LASSO IV estimation for the selection of z, but take all x variables as controls.

24) Run LASSO IV estimation for the selection of x, while using all of the first 20 elements in z as

instruments

2019 Selina Gangl

4

Part II: Conceptual questions

Probably, we won’t have time to discuss this part in the pc session.

Anyway, you may use this part like a mock exam.

25) Compare Lasso estimation and standard OLS and comment on similarities and differences.

26) Compare ridge regression and Lasso estimation and comment on similarities and differences.

27) Explain the concept of k-fold cross-validation for picking the shrinkage factor in Lasso.

28) Explain the concept of post-Lasso double selection in OLS for performing causal inference.

29) Explain the idea of adaptive Lasso. For which reason might it be preferred over

“conventional” Lasso?

30) Explain the concept of a “sparse” model.

31) What is the advantage of shrinkage methods compared to “classical” variable selection

methods like forward selection or backwards elimination?


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp