http://burakkanber.com/blog/machine-learning-in-js-k-nearest-neighbor-part-1/

http://www.cut-the-knot.org/Probability/Proofreading.shtml

Given the number of rooms and area (in square feet) of a type of dwelling, figure out if it’s an apartment, house, or flat.

As always, we’re starting with the most contrived possible problem
in order to learn the basics. The description of this problem has *given *us
the features we need to look at: number of rooms, and square feet.
We can also assume that, since this is a supervised learning
problem, we’ll be given a handful of example apartments, houses, and
flats.

I think the best way to teach the kNN algorithm is to simply define what the phrase “k-nearest-neighbor” actually means.

Here’s a table of the example data we’re given for this problem:

Rooms | Area | Type |
---|---|---|

1 | 350 | apartment |

2 | 300 | apartment |

3 | 300 | apartment |

4 | 250 | apartment |

4 | 500 | apartment |

4 | 400 | apartment |

5 | 450 | apartment |

7 | 850 | house |

7 | 900 | house |

7 | 1200 | house |

8 | 1500 | house |

9 | 1300 | house |

8 | 1240 | house |

10 | 1700 | house |

9 | 1000 | house |

1 | 800 | flat |

3 | 900 | flat |

2 | 700 | flat |

1 | 900 | flat |

2 | 1150 | flat |

1 | 1000 | flat |

2 | 1200 | flat |

1 | 1300 | flat |

We’re going to plot the above as points on a graph in two dimensions, using number of rooms as the x-axis and the area as the y-axis.

When we inevitably run into a new, unlabeled data point (“mystery point”), we’ll put that on the graph too. Then we’ll pick a number (called “k”) and just find the “k” closest points on the graph to our mystery point. If the majority of the points close to the new point are “flats”, then we’ll guess that our mystery point is a flat.

That’s what k-nearest-neighbor means. “If the 3 (or 5 or 10, or ‘k’) nearest neighbors to the mystery point are two apartments and one house, then the mystery point is an apartment.”

Here’s the (simplified) procedure:

- Put all the data you have (including the mystery point) on a graph.
- Measure the distances between the mystery point and every other point.
- Pick a number. Three is usually good for small data sets.
- Figure out what the three closest points to the mystery point are.
- The majority of the three closest points is the answer.

If you’re having trouble visualizing this, please take a quick break to scroll down to the bottom of the page and run the JS fiddle. That should illustrate the concept. Then come back up here and continue reading!