# K-Nearest Neighbors

Ecole Nationale Supérieure de Cognitique

Baptiste Pesquet

## Summary

• K-NN in a nutshell
• K-NN for classification
• K-NN for regression

## K-NN in a nutshell

• Instance-based (no model construction).
• No training phase, computations happen during predictions.
• $\neq$ K-Means algorithm.
• Very simple yet often efficient.
• Can be used for classification or regression.

## K-NN for classification

A new point is assigned to the class which has the most representatives within the k nearest neighbors of the point.

## The k parameter

• Positive integer, chosen by the user. Typically small.
• a larger k suppresses the effects of noise, but makes the classification boundaries less distinct.
• $k=5$ is a frequent choice.
• $k=1$ => nearest neighbor algorithm.

## Neighbors selection

• Brute force: compute distance between all training samples.
• Efficient with few samples.
• Other methods try to reduce the required number of distance calculations.
• Basic idea: if A is very distant from B and B is very close to C, we can infer that A and C are very distant.

## Neighborhood class computation

• Basic form: uniform weights (simple majority vote). Suboptimal when dataset is skewed (imbalanced).
• Distance weights: closer neighbors weight more than farther ones.

## Pseudocode (brute force, uniform weights)

Set k
For each training sample
Compute distance between sample and new point
Keep the k samples with smallest distance
Find the most frequent class between the k samples
Assign this class to the new point


## K-NN for regression

The target is predicted by local interpolation of the targets associated of the k nearest neighbors.

## K-NN shortcomings

• Good results for small datasets with few features.
• Distances computation can become intractable with many samples, or samples with many features.
• Categorical features make the distances computation difficult.

## Demo time!

Classify Planar Data with K-NN

Classify Fruits with K-NN