ML Interview Question: Optimal number of clusters in a clustering problem?
Answers
-
According to the gap statistic method, k=12 is also determined as the optimal number of clusters (Figure 13). We can visually compare k-Means clusters with k=9 (optimal according to the elbow method) and k=12 (optimal according to the silhouette and gap statistic methods) (see Figure 14).
-
-
Determining the optimal number of clusters for a clustering algorithm is a common problem in unsupervised learning. There are several methods that can be used to determine the optimal number of clusters, including:
Elbow method: The elbow method is a popular method for determining the optimal number of clusters. The idea is to plot the relationship between the number of clusters and Within-Cluster-Sum-of-Squares (WCSS) or another similar metric. The optimal number of clusters is typically the value of k at the "elbow" point, where the change in WCSS begins to level off.
Silhouette Score: The silhouette score is a measure of how similar an object is to its own cluster compared to other clusters. The optimal number of clusters is the one that maximizes the average silhouette score for all the samples.
Gap statistic: The gap statistic compares the total WCSS for a given number of clusters with that of a reference distribution, typically a uniform random distribution. The optimal number of clusters is the one that maximizes the gap statistic.