K-Nearest Neighbors

Ecole Nationale Supérieure de Cognitique

Baptiste Pesquet

GitHub logo

Summary

  • K-NN in a nutshell
  • K-NN for classification
  • K-NN for regression

K-NN in a nutshell

  • Instance-based (no model construction).
  • No training phase, computations happen during predictions.
  • $ \neq $ K-Means algorithm.
  • Very simple yet often efficient.
  • Can be used for classification or regression.

K-NN for classification

A new point is assigned to the class which has the most representatives within the k nearest neighbors of the point.

K-NN example

The k parameter

  • Positive integer, chosen by the user. Typically small.
  • a larger k suppresses the effects of noise, but makes the classification boundaries less distinct.
  • $k=5$ is a frequent choice.
  • $k=1$ => nearest neighbor algorithm.

K parameter importance

Distance metrics

Neighbors selection

  • Brute force: compute distance between all training samples.
    • Efficient with few samples.
  • Other methods try to reduce the required number of distance calculations.
    • Basic idea: if A is very distant from B and B is very close to C, we can infer that A and C are very distant.

Neighborhood class computation

  • Basic form: uniform weights (simple majority vote). Suboptimal when dataset is skewed (imbalanced).
  • Distance weights: closer neighbors weight more than farther ones.

Weights importance in K-NN

Pseudocode (brute force, uniform weights)

Set k
For each training sample
  Compute distance between sample and new point
Keep the k samples with smallest distance
Find the most frequent class between the k samples
Assign this class to the new point

K-NN for regression

The target is predicted by local interpolation of the targets associated of the k nearest neighbors.

K-NN for regression

K-NN shortcomings

  • Good results for small datasets with few features.
  • Distances computation can become intractable with many samples, or samples with many features.
  • Categorical features make the distances computation difficult.

Demo time!

Classify Planar Data with K-NN

Classify Fruits with K-NN