Linear Regression

Posted on Jun 14, 2018 in Notes • 11 min read

Linear Regression

A supervised learning technique, it doesn't force you to fine tune a bunch of parameters to get it working. You can literally just dump your data into it and let it produce its results.

__init__(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)

parameters

  • fit_intercept: (bool) Whether to calculate the intercept for this model
  • normalize: (bool) Whether to normalize input before regression
  • copy_X: (bool) Whether X will be copied; else, it may be overwritten
  • n_jobs: (int) Number of jobs to use for the computation. If -1 all CPUs are used.

attributes

  • coef_: estimated coefficients for the linear regression
  • intercept_: the scalar constant offset value

GOTCHAS

  • .score() returns the R^2 coefficient
  • R^2 coefficient
    • <= 1 (regression exactly matches data)
    • tends to increase with number of features (overfitting)
    • normalised, unlike the the sum of squared distances, not affected by the number of observations
  • assumes your variables are linearly independent, i.e. the feature values of one sample have nothing to do with the values of another sample
  • the further you extrapolate from the range of your training data, the less reliable the results of the regression

DOCUMENTATION

In [ ]:
from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(X_train, y_train)

Visualisation

Regression Line

note: scatter() can only take numpy 2DArray

In [ ]:
# Plots your test observations, compares them to the regression line, and displays the R2 coefficient
def drawLine(model, X_test, y_test, title, R2):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(X_test, y_test, c='g', marker='o')llov ;khbjcb n  
    ax.plot(X_test, model.predict(X_test), color='orange', linewidth=1, alpha=0.7)

    title += " R2: " + str(R2)
    ax.set_title(title)
    print(title)
    print("Intercept(s): ", model.intercept_)

    plt.show()
In [ ]:
# Slicing a Series and converting it into a numpy 2DArray
roomBoard = X.loc[:, 'Room.Board'].values.reshape((len(X),1))
accStudent = X.loc[:, 'Accept'].values.reshape((len(X),1))

Regression Plane

In [ ]:
def drawPlane(model, X_test, y_test, title, R2):
    # This convenience method will take care of plotting your
    # test observations, comparing them to the regression plane,
    # and displaying the R2 coefficient
    fig = plt.figure()
    ax = Axes3D(fig)
    ax.set_zlabel('prediction')

    
    # You might have passed in a DataFrame, a Series (slice),
    # an NDArray, or a Python List... so let's keep it simple:
    X_test = np.array(X_test)
    col1 = X_test[:,0]
    col2 = X_test[:,1]

    
    # Set up a Grid. We could have predicted on the actual
    # col1, col2 values directly; but that would have generated
    # a mesh with WAY too fine a grid, which would have detracted
    # from the visualization
    x_min, x_max = col1.min(), col1.max()
    y_min, y_max = col2.min(), col2.max()
    x = np.arange(x_min, x_max, (x_max-x_min) / 10)
    y = np.arange(y_min, y_max, (y_max-y_min) / 10)
    x, y = np.meshgrid(x, y)

    
    # Predict based on possible input values that span the domain
    # of the x and y inputs:
    z = model.predict(  np.c_[x.ravel(), y.ravel()]  )
    z = z.reshape(x.shape)

    
    ax.scatter(col1, col2, y_test, c='g', marker='o')
    ax.plot_wireframe(x, y, z, color='orange', alpha=0.7)

    title += " R2: " + str(R2)
    ax.set_title(title)
    print(title)
    print("Intercept(s): ", model.intercept_)

    plt.show()