30 KNN

Basic Idea: If it walks like a duck, quacks like a duck, then it is probably a duck!

3 Main Requirements

  • The set of stored records
  • Distance Metric to compute distance between records
  • The value of k, the number of nearest neighbors to retrieve

To classify an unknow record:

  • Compute distance to other training records
  • Identify k nearest neighbors
  • By taking majority, choosing the label

Types of Distances

distances

Euclidian distance

d(p,q) = sqrt(sum(p-q)^2)

Manhattan distance

The Manhattan distance function computes the distance that would be traveled to get from one data point to the other if a grid-like path is followed. The Manhattan distance between two items is the sum of the differences of their corresponding components.

The formula for this distance between a point X=(X1, X2, etc.) and a point Y=(Y1, Y2, etc.) is:
manhattan distance

How to choose the value of k:
If k is too small, sensitive to noise points
If k is too large, neighborhood may include points from other classes

K-NN Classifiers are lazy learners
It doesn’t build model explicitly
Unlike eager learners such as decision tree induction and rule-based systems

 

Python 3 Example: Please click here to see the Python3 Example.

Resources

License

Building Skills for Data Science Copyright © by Dr. Nouhad Rizk. All Rights Reserved.

Share This Book