Linear Regression

Posted on Jun 14, 2018 in Notes • 11 min read

Linear Regression¶

A supervised learning technique, it doesn't force you to fine tune a bunch of parameters to get it working. You can literally just dump your data into it and let it produce its results.

__init__(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)

parameters

fit_intercept: (bool) Whether to calculate the intercept for this model
normalize: (bool) Whether to normalize input before regression
copy_X: (bool) Whether X will be copied; else, it may be overwritten
n_jobs: (int) Number of jobs to use for the computation. If -1 all CPUs are used.

attributes

coef_: estimated coefficients for the linear regression
intercept_: the scalar constant offset value

GOTCHAS

.score() returns the R^2 coefficient
R^2 coefficient
- <= 1 (regression exactly matches data)
- tends to increase with number of features (overfitting)
- normalised, unlike the the sum of squared distances, not affected by the number of observations
assumes your variables are linearly independent, i.e. the feature values of one sample have nothing to do with the values of another sample
the further you extrapolate from the range of your training data, the less reliable the results of the regression

DOCUMENTATION

In [ ]:

from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(X_train, y_train)

Visualisation¶

Regression Line¶

note: scatter() can only take numpy 2DArray

In [ ]:

# Plots your test observations, compares them to the regression line, and displays the R2 coefficient
def drawLine(model, X_test, y_test, title, R2):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(X_test, y_test, c='g', marker='o')llov ;khbjcb n  
    ax.plot(X_test, model.predict(X_test), color='orange', linewidth=1, alpha=0.7)

    title += " R2: " + str(R2)
    ax.set_title(title)
    print(title)
    print("Intercept(s): ", model.intercept_)

    plt.show()

In [ ]:

# Slicing a Series and converting it into a numpy 2DArray
roomBoard = X.loc[:, 'Room.Board'].values.reshape((len(X),1))
accStudent = X.loc[:, 'Accept'].values.reshape((len(X),1))

Regression Plane¶

In [ ]:

def drawPlane(model, X_test, y_test, title, R2):
    # This convenience method will take care of plotting your
    # test observations, comparing them to the regression plane,
    # and displaying the R2 coefficient
    fig = plt.figure()
    ax = Axes3D(fig)
    ax.set_zlabel('prediction')

    
    # You might have passed in a DataFrame, a Series (slice),
    # an NDArray, or a Python List... so let's keep it simple:
    X_test = np.array(X_test)
    col1 = X_test[:,0]
    col2 = X_test[:,1]

    
    # Set up a Grid. We could have predicted on the actual
    # col1, col2 values directly; but that would have generated
    # a mesh with WAY too fine a grid, which would have detracted
    # from the visualization
    x_min, x_max = col1.min(), col1.max()
    y_min, y_max = col2.min(), col2.max()
    x = np.arange(x_min, x_max, (x_max-x_min) / 10)
    y = np.arange(y_min, y_max, (y_max-y_min) / 10)
    x, y = np.meshgrid(x, y)

    
    # Predict based on possible input values that span the domain
    # of the x and y inputs:
    z = model.predict(  np.c_[x.ravel(), y.ravel()]  )
    z = z.reshape(x.shape)

    
    ax.scatter(col1, col2, y_test, c='g', marker='o')
    ax.plot_wireframe(x, y, z, color='orange', alpha=0.7)

    title += " R2: " + str(R2)
    ax.set_title(title)
    print(title)
    print("Intercept(s): ", model.intercept_)

    plt.show()

Linear Regression¶

Visualisation¶

Regression Line¶

Regression Plane¶

You might enjoy