30 KNN
Basic Idea: If it walks like a duck, quacks like a duck, then it is probably a duck!
3 Main Requirements
- The set of stored records
- Distance Metric to compute distance between records
- The value of k, the number of nearest neighbors to retrieve
To classify an unknow record:
- Compute distance to other training records
- Identify k nearest neighbors
- By taking majority, choosing the label
Types of Distances
Euclidian distance
d(p,q) = sqrt(sum(p-q)^2)
Manhattan distance
The Manhattan distance function computes the distance that would be traveled to get from one data point to the other if a grid-like path is followed. The Manhattan distance between two items is the sum of the differences of their corresponding components.
The formula for this distance between a point X=(X1, X2, etc.) and a point Y=(Y1, Y2, etc.) is:
How to choose the value of k:
If k is too small, sensitive to noise points
If k is too large, neighborhood may include points from other classes
K-NN Classifiers are lazy learners
It doesn’t build model explicitly
Unlike eager learners such as decision tree induction and rule-based systems
Python 3 Example: Please click here to see the Python3 Example.