Member-only story

Understanding K Means Clustering Graphically

Changsin Lee
4 min readAug 21, 2021

--

Photo by Rajendra Biswal on Unsplash

Clustering is a popular unsupervised machine learning algorithm used as a way to classify or preprocess data in machine learning. K Means clustering algorithm is an iterative algorithm of keeps finding new centroids and then re-clustering again. In this article, I will show graphically how this is done from scratch in Python. The entire source code is shared here.

The Algorithm

For the sake of illustration, let’s just use two clusters (K = 2). The algorithm is as follows:

  1. Pick two random points as the initial centroids (a centroid is the center point of a cluster).
  2. Put each data point into a cluster based on its distance to the centroids. In other words, a data point will be put in the cluster whose centroid is closer.
  3. After clustering, calculate the real centroids for each cluster.
  4. Recluster the data with the new centroids.
  5. Check if there is any change in clusters or centroids. If there is no change, we are done. Otherwise, repeat 3–5.

Implementation

Let’s implement the algorithm in Python with graphs.

0. Generate random points

--

--

Changsin Lee
Changsin Lee

Written by Changsin Lee

AI/ML Enthusiast | Software Engineer | ex-Microsoftie | ex-Amazonian

No responses yet