Given the number of rooms and area (in square feet) of a type of dwelling, figure out if it’s an apartment, house, or flat.
As always, we’re starting with the most contrived possible problem in order to learn the basics. The description of this problem has given us the features we need to look at: number of rooms, and square feet. We can also assume that, since this is a supervised learning problem, we’ll be given a handful of example apartments, houses, and flats.
I think the best way to teach the kNN algorithm is to simply define what the phrase “k-nearest-neighbor” actually means.
Here’s a table of the example data we’re given for this problem:
We’re going to plot the above as points on a graph in two dimensions, using number of rooms as the x-axis and the area as the y-axis.
When we inevitably run into a new, unlabeled data point (“mystery point”), we’ll put that on the graph too. Then we’ll pick a number (called “k”) and just find the “k” closest points on the graph to our mystery point. If the majority of the points close to the new point are “flats”, then we’ll guess that our mystery point is a flat.
That’s what k-nearest-neighbor means. “If the 3 (or 5 or 10, or ‘k’) nearest neighbors to the mystery point are two apartments and one house, then the mystery point is an apartment.”
Here’s the (simplified) procedure:
If you’re having trouble visualizing this, please take a quick break to scroll down to the bottom of the page and run the JS fiddle. That should illustrate the concept. Then come back up here and continue reading!