Homework Week 3: Linear Regression
and Neural Network
1.1 (5pts)
Provide the python code to do the following:
Choose the seed 6996
Epsilon_vector = random_number_from_normal_distribution(500,0,1)
X_matrix<- random_number_from_normal_distribution(500*500,0,2)
ReshapeMatrix(X_matrix,500,500) # you need to have a matrix 500 x 500
slopesSet_vector<- random_number_from_uniform_distribution (500,1,5)
Y<-sapply(2:500,function(z) 1+ X_matrix[,1:z]%*%slopesSet_vector
[1:z]+Epsilo n)
If you test the dimension of Y. You will find 500 x 499.
By construction all the predictors are expected to be significant and uncorrelated.
1.2 Analysis of accuracy of inference as
function of number of predictors (5pts)
Plot the p-values for the 490 predictors. You should obtain a chart similar as the one below. Please
do not try to replicate this chart, your p-values will be different from this one.
1.3 (5pts)
Plot r-squared of all the models from 2 to 500 by adding a new parameter.
1.4 (5pts)
Plot the confidence interval lower bound and upper bound for the coefficient beta_1 for all the models
from 2 to 500.
Conclusions:
1. The more predictors, the higher R squared.
2. But inference for a fixed predictor becomes less and less accurate, which is shown by the
widening confidence interval.
3. This means that if there is, for example, one significant predictor Xi,1Xi,1, by increasing the
total number of predictors (even though they all or many of them may be significant) we can
damage accuracy of estimation of the slope for Xi,1Xi,1.
4. This example shows one problem that DM has to face, which is not emphasized in traditional
courses on statistical analysis where only low numbers of predictors are considered.
2.1 Simulation of the data (2pts)
Generate the following data:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
# from sklearn import linear_model
# from sklearn.metrics import mean_squared_error, r2_score
np.random.seed(6996)
# error term
epsilon_vec = np.random.normal(0,1,500).reshape(500,1)
# X_matrix or regressors or predictiors
X_mat = np.random.normal(0,2,size = (500,500))
# Slope
slope_vec = np.random.uniform(1,5,500)
# Simulate Ys
Y_mat = 1 + np.cumsum(X_mat * slope_vec,axis=1)[:,1:] + epsilon_vec
# each col of Y_mat representing one simulation vector: starting with 2
regressors, end with 500
print(Y_mat.shape)
#You should have (500, 499)
2.2 Fitting linear models (5pts)
1)Fit linear model with the first 10 predictors. Store the result in the variable m10.
2)Fit linear model with 491 predictors. Store the result in the variable v490.
2.3 Ridge regression (5pts)
1)Apply ridge regression to the data with 10 predictors.
2) Separate the sample into train and test.
3) Select the best parameter λ using cross validation on train set.
4)Calculate mean squared prediction error for the best selected λ
5) Compare the mean squared prediction error of linear
model What you should observe:
Ridge regression did not select predictors. It is expected because we simulated all predictors
to be significant.
Ridge regression made a small improvement to mean squared prediction error. This is
consistent with expectation because it has one additional parameter.
Regularization is expected to reduce number of predictors when there are collinear
(highly correlated) predictors.
Predictors in this example are not collinear.
2.4 Lasso regression (5pts)
1)Fit lasso regression to the first 10 predictors.
2)Fit the model to the entire data.
Lasso regression marginally improved the mean squared error relative to the linear model,
but did worse than ridge regression.
It kept all 10 predictors and produced similar estimates of parameters.
2.5 Large number of significant
predictors (5pts)
1)Apply lasso regression analysis to data with 490 predictors.
2)Note that there are no actual slopes close to zero, but lasso regression still pushes them to
zero when λ=0.
3)Calculate mean squares prediction error for the best lambda.
4)Fit lasso regression model to the entire data.
Plot the set of true slopes used in simulation and mark slopes removed by lasso.
Lasso removed predictors seemingly randomly regardless of the value of slope.
Given the way the sample was simulated (independent predictors with slopes between 1 and 3) it
would be more reasonable removing none or removing the predictors with smallest slopes.
3. Neural Network
You can download the data session3_homework.csv on piazza.
You will use the following libraries:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier as MLP
from sklearn.metrics import confusion_matrix
data = pd.read_csv('session3_homework.csv')
data.head()
Credit scoring is the practice of analysing a persons background and credit application in order to assess the
creditworthiness of the person.
We are trying to find which parameters impact the creditworthiness:
creditworthiness=f(income, age, gender, …)
The dataset contains information on different clients who received a loan at least 10 years ago.
The variables:
income (yearly),
age,
loan (size in euros),
LTI (the loan to yearly income ratio)
are available.
The goal is to predict, based on the input variables LTI and age, whether or not a default will occur within 10 years.
Step 1: Separate the data into train and test.
X = np.array(data[['LTI','age']])
Y = np.array(data['default10yr'])
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)
Step 2: Train neural net with one hidden layer including 4 neurons. Plot the configuration if possible
XOR_MLP = MLP(activation='tanh',alpha=0.,batch_size='auto',beta_1=0.9,beta_2=0.999,\
early_stopping=False,epsilon=1e-08,hidden_layer_sizes= (4,),\
learning_rate='constant',learning_rate_init = 0.1,max_iter=5000,momentum=0.5,\
nesterovs_momentum=True,power_t=0.5,random_state=0,shuffle=True,solver='sgd',\
tol=0.0001, validation_fraction=0.1,verbose=False,warm_start=False)
XOR_MLP.fit(X_train,Y_train)
def draw_neural_net(ax, left, right, bottom, top, layer_sizes, coefs_, intercepts_, input_list,
out_put , np, plt):
n_layers = len(layer_sizes)
v_spacing = (top - bottom)/float(max(layer_sizes))
h_spacing = (right - left)/float(len(layer_sizes) - 1)
layer_top_0 = v_spacing*(layer_sizes[0] - 1)/2. + (top + bottom)/2.
for m in range(layer_sizes[0]):
plt.arrow(left-0.18, layer_top_0 - m*v_spacing, 0.12, 0, lw =1, head_width=0.01,
head_length=0.02)
for n, layer_size in enumerate(layer_sizes):
layer_top = v_spacing*(layer_size - 1)/2. + (top + bottom)/2.
for m in range(layer_size):
circle = plt.Circle((n*h_spacing + left, layer_top - m*v_spacing), v_spacing/8.,\
color='w', ec='k', zorder=4)
if n == 0:
plt.text(left-0.125, layer_top - m*v_spacing, input_list[m] , fontsize=15)
elif n == n_layers -1:
plt.text(n*h_spacing + left+0.05, layer_top - m*v_spacing, out_put, fontsize=15)
ax.add_artist(circle)
for n, layer_size in enumerate(layer_sizes):
if n < n_layers -1:
x_bias = (n+0.5)*h_spacing + left
y_bias = top + 0.005
circle = plt.Circle((x_bias, y_bias), v_spacing/8.,color='w', ec='b', zorder=4)
plt.text(x_bias, y_bias, str(1),color='k', fontsize=15)
ax.add_artist(circle)
# Edges between nodes
for n, (layer_size_a, layer_size_b) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
layer_top_a = v_spacing*(layer_size_a - 1)/2. + (top + bottom)/2.
layer_top_b = v_spacing*(layer_size_b - 1)/2. + (top + bottom)/2.
for m in range(layer_size_a):
for o in range(layer_size_b):
line = plt.Line2D([n*h_spacing + left, (n + 1)*h_spacing + left],
[layer_top_a - m*v_spacing, layer_top_b - o*v_spacing], c='k')
ax.add_artist(line)
xm = (n*h_spacing + left)
xo = ((n + 1)*h_spacing + left)
ym = (layer_top_a - m*v_spacing)
yo = (layer_top_b - o*v_spacing)
rot_mo_rad = np.arctan((yo-ym)/(xo-xm))
rot_mo_deg = rot_mo_rad*180./np.pi
xm1 = xm + (v_spacing/8.+0.05)*np.cos(rot_mo_rad)
if n == 0:
if yo > ym:
ym1 = ym + (v_spacing/8.+0.12)*np.sin(rot_mo_rad)
else:
ym1 = ym + (v_spacing/8.+0.05)*np.sin(rot_mo_rad)
else:
if yo > ym:
ym1 = ym + (v_spacing/8.+0.12)*np.sin(rot_mo_rad)
else:
ym1 = ym + (v_spacing/8.+0.04)*np.sin(rot_mo_rad)
plt.text( xm1, ym1,str(round(coefs_[n][m, o],4)),rotation = rot_mo_deg,fontsize =
10)
# Edges between bias and nodes
for n, (layer_size_a, layer_size_b) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
if n < n_layers-1:
layer_top_a = v_spacing*(layer_size_a - 1)/2. + (top + bottom)/2.
layer_top_b = v_spacing*(layer_size_b - 1)/2. + (top + bottom)/2.
x_bias = (n+0.5)*h_spacing + left
y_bias = top + 0.005
for o in range(layer_size_b):
line = plt.Line2D([x_bias, (n + 1)*h_spacing + left],[y_bias, layer_top_b -
o*v_spacing], c='b')
ax.add_artist(line)
xo = ((n + 1)*h_spacing + left)
yo = (layer_top_b - o*v_spacing)
rot_bo_rad = np.arctan((yo-y_bias)/(xo-x_bias))
rot_bo_deg = rot_bo_rad*180./np.pi
xo2 = xo - (v_spacing/8.+0.01)*np.cos(rot_bo_rad)
yo2 = yo - (v_spacing/8.+0.01)*np.sin(rot_bo_rad)
xo1 = xo2 -0.05 *np.cos(rot_bo_rad)
yo1 = yo2 -0.05 *np.sin(rot_bo_rad)
plt.text( xo1, yo1,str(round(intercepts_[n][o],4)),rotation = rot_bo_deg,
fontsize = 10)
layer_top_0 = v_spacing*(layer_sizes[-1] - 1)/2. + (top + bottom)/2.
for m in range(layer_sizes[-1]):
plt.arrow(right+0.015, layer_top_0 - m*v_spacing, 0.16*h_spacing, 0, lw =1,
head_width=0.01, head_length=0.02)
input_list = ['LTI','age']
out_put = 'default10yr'
fig = plt.figure(figsize=(12, 12))
ax = fig.gca()
ax.axis('off')
Step 3: Predict the output and measure its accuracy.
(You do not have to find the same results, this is just an example)
Accuracy = 0.996%
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。