SVC

Posted on Jul 16, 2018 in Notes • 12 min read

Support Vector Classification

support vetor: the samples from either class that are closest to one another.

SVC is a class of Support Vector Machine(SVM), which is a set of supervised learning algorithms that one can use for classification, regression and outlier detection purposes.

SVC solves the classification problem by finding the equation of the hyperplane (linear surface) which results in the most separation between two classes of samples. SVC doesn't care about the samples that are clearly cats, or that are clearly dogs. Rather, they focus on those samples, or support vectors, closest to the decision boundary, so they can compute precisely the smallest change of features that differentiate between the two, perchance it's able to properly classify all of them.

How? SVC first finds a way to separate your data. It does this by looking at the samples from either class that are closest to one another, or in machine learning lingo, the support vectors.
SVC then ensures it separates your data in the best way possible by orienting the boundary such that the gap, or margin between it and your support vector samples is maximised.

The Kernel Trick

Example: Seperating Billiards Balls into Odds and Evens

When a straight surface is not enough to separate the balls, imagine conducting an extremely calculated table-flip which sends odd balls higher up than even.

The kernel is a similarity function which tells you, given your current ball positions, how far apart would the balls have been, had you flipped the table, so no actual table-flipping is needed.

__init__(C=1.0, kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)

parameters

  • kernel: (str) Which kernel to use for the table-flipping
  • C: (float) The penalty for erroroneous classifications. Determines the generality of the classification. The higher it is, the more it accomodates to individual samples; the lower it is, the more it models the overall function of the data with a smoother decision boundary.
  • gamma: (float) Determines how pronounced the decision boundary is, by changing the influence of the support vector samples. Large gamma values result in each training sample having localized effects only. Smaller values result in each sample affecting a larger area.

attributes

  • support_vetors_: returns the original support vector samples in array-form
  • intercept_: returns the constants of the decision function
  • dual_coef_: returns each support vector's contribution to the decision function, on a per classification basis. This has similarities to the weights of linear regression

GOTCHAS

  • Classification speed >> training speed (opposite of K-Neighbors)
  • Extremely effective, even in high dimensional spaces.
  • Sometimes works with cases where your dataset has more features than samples. So long as the samples you do have are close to the decision boundary, there's a good chance your support vector classifier will do just well.
  • Non-probabilistic: classification is based on the geometry of your dataset, as opposed to probabilities of occurrences.
  • Does not directly give probability estimates for what class a sample belongs to.

DOCUMENTATION

In [ ]:
from sklearn.svm import SVC
model = SVC(kernel='linear')
model.fit(X, y)