Slicebox - 3D Image Slidr

http://burakkanber.com/blog/machine-learning-in-js-k-nearest-neighbor-part-1/

http://www.cut-the-knot.org/Probability/Proofreading.shtml

The Problem

Given the number of rooms and area (in square feet) of a type of dwelling, figure out if it’s an apartment, house, or flat.

As always, we’re starting with the most contrived possible problem in order to learn the basics. The description of this problem has given us the features we need to look at: number of rooms, and square feet. We can also assume that, since this is a supervised learning problem, we’ll be given a handful of example apartments, houses, and flats.

What “k-nearest-neighbor” Means

I think the best way to teach the kNN algorithm is to simply define what the phrase “k-nearest-neighbor” actually means.

Here’s a table of the example data we’re given for this problem:

Rooms	Area	Type
1	350	apartment
2	300	apartment
3	300	apartment
4	250	apartment
4	500	apartment
4	400	apartment
5	450	apartment
7	850	house
7	900	house
7	1200	house
8	1500	house
9	1300	house
8	1240	house
10	1700	house
9	1000	house
1	800	flat
3	900	flat
2	700	flat
1	900	flat
2	1150	flat
1	1000	flat
2	1200	flat
1	1300	flat

We’re going to plot the above as points on a graph in two dimensions, using number of rooms as the x-axis and the area as the y-axis.

When we inevitably run into a new, unlabeled data point (“mystery point”), we’ll put that on the graph too. Then we’ll pick a number (called “k”) and just find the “k” closest points on the graph to our mystery point. If the majority of the points close to the new point are “flats”, then we’ll guess that our mystery point is a flat.

That’s what k-nearest-neighbor means. “If the 3 (or 5 or 10, or ‘k’) nearest neighbors to the mystery point are two apartments and one house, then the mystery point is an apartment.”

Here’s the (simplified) procedure:

Put all the data you have (including the mystery point) on a graph.
Measure the distances between the mystery point and every other point.
Pick a number. Three is usually good for small data sets.
Figure out what the three closest points to the mystery point are.
The majority of the three closest points is the answer.

If you’re having trouble visualizing this, please take a quick break to scroll down to the bottom of the page and run the JS fiddle. That should illustrate the concept. Then come back up here and continue reading!