Linear Regression
Posted on Jun 14, 2018 in Notes • 11 min read
Linear Regression¶
A supervised learning technique, it doesn't force you to fine tune a bunch of parameters to get it working. You can literally just dump your data into it and let it produce its results.
__init__(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)
parameters
fit_intercept
: (bool) Whether to calculate the intercept for this modelnormalize
: (bool) Whether to normalize input before regressioncopy_X
: (bool) Whether X will be copied; else, it may be overwrittenn_jobs
: (int) Number of jobs to use for the computation. If -1 all CPUs are used.
attributes
coef_
: estimated coefficients for the linear regressionintercept_
: the scalar constant offset value
GOTCHAS
.score()
returns the R^2 coefficient- R^2 coefficient
- <= 1 (regression exactly matches data)
- tends to increase with number of features (overfitting)
- normalised, unlike the the sum of squared distances, not affected by the number of observations
- assumes your variables are linearly independent, i.e. the feature values of one sample have nothing to do with the values of another sample
- the further you extrapolate from the range of your training data, the less reliable the results of the regression
In [ ]:
from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(X_train, y_train)
In [ ]:
# Plots your test observations, compares them to the regression line, and displays the R2 coefficient
def drawLine(model, X_test, y_test, title, R2):
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(X_test, y_test, c='g', marker='o')llov ;khbjcb n
ax.plot(X_test, model.predict(X_test), color='orange', linewidth=1, alpha=0.7)
title += " R2: " + str(R2)
ax.set_title(title)
print(title)
print("Intercept(s): ", model.intercept_)
plt.show()
In [ ]:
# Slicing a Series and converting it into a numpy 2DArray
roomBoard = X.loc[:, 'Room.Board'].values.reshape((len(X),1))
accStudent = X.loc[:, 'Accept'].values.reshape((len(X),1))
Regression Plane¶
In [ ]:
def drawPlane(model, X_test, y_test, title, R2):
# This convenience method will take care of plotting your
# test observations, comparing them to the regression plane,
# and displaying the R2 coefficient
fig = plt.figure()
ax = Axes3D(fig)
ax.set_zlabel('prediction')
# You might have passed in a DataFrame, a Series (slice),
# an NDArray, or a Python List... so let's keep it simple:
X_test = np.array(X_test)
col1 = X_test[:,0]
col2 = X_test[:,1]
# Set up a Grid. We could have predicted on the actual
# col1, col2 values directly; but that would have generated
# a mesh with WAY too fine a grid, which would have detracted
# from the visualization
x_min, x_max = col1.min(), col1.max()
y_min, y_max = col2.min(), col2.max()
x = np.arange(x_min, x_max, (x_max-x_min) / 10)
y = np.arange(y_min, y_max, (y_max-y_min) / 10)
x, y = np.meshgrid(x, y)
# Predict based on possible input values that span the domain
# of the x and y inputs:
z = model.predict( np.c_[x.ravel(), y.ravel()] )
z = z.reshape(x.shape)
ax.scatter(col1, col2, y_test, c='g', marker='o')
ax.plot_wireframe(x, y, z, color='orange', alpha=0.7)
title += " R2: " + str(R2)
ax.set_title(title)
print(title)
print("Intercept(s): ", model.intercept_)
plt.show()