K-Neighbors Classification

Posted on Jun 14, 2018 in Notes • 11 min read

K-Neighbors Classification¶

A supervised classification algorithm = the process of assigning samples into groups

It is a parameter free approach to classification. So for example, you don't have to worry about things like your data being linearly separable or not.

__init__(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=1)

parameters

n_neighbors: (int) The number of neighbors to consider. Keep it odd when doing binary classification, particularly when you use uniform weighting.
weights: (str) How to count the votes from the neighbors; does everyone get an equal vote, a weighted vote, or something else?
algorithm: {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’} Select an optimization method for searching through your training data to find the nearest neighbors.

GOTCHAS

use an odd number
Sensitive to feature scaling
Sensitive to perturbations and the local structure of your dataset, particularly at lower "K" values.
Data needs to be measurable. Metric for discerning distance between your features needed.
With large "K" values, you have to be more cautious of the overall class distribution of your samples. Unjust preference might be given to classes taking up especially high percentages.

DOCUMENTATION

Separating out the 'labels' of the data¶

In [ ]:

# Process:
# Load a dataset into a dataframe
X = pd.read_csv('data.set', index_col=0)

# Do basic wrangling, but no transformations
# ...

# Immediately copy out the classification / label / class / answer column
y = X['classification'].copy()
X.drop(labels=['classification'], inplace=True, axis=1)

# Feature scaling as necessary
# ...

# Machine Learning
# ...

# Evaluation
# ...

In [ ]:

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y) 

# You can pass in a dframe or an ndarray
knn.predict([[1.1]])

knn.predict_proba([[0.9]])

# Returns the mean accuracy on the given test data and labels.
knn.score(X, y)

K-Neighbors Classification¶

Separating out the 'labels' of the data¶

You might enjoy