KMeans(self, n_clusters = 3, tolerance = 0.01, max_iter = 100, runs = 1, init_method="forgy")
n_clusters
: Number of clusters
tolerance
: Tolerance value. Algorithm stops if distance between previous centroids and current centroids is less than tolerance.
max_iter
: Number of iterations in every run.
runs
: Determines how many times the algorithm will run. Makes sense only if random initialization method is used. Therefore disregarded when a non-random initialization method is chosen.
init_method
: Initialization method. Only four methods implemented: Forgy, Macqueen, Maximin, Var-Part
Macqueen simply selects first K obversation and assigns them as centroids.
Forgy takes K random data points as initial centroids
Maximin and Var-Part are more sophisticated initialization methods. Var-Part is usually more efficient. Link to their related papers are in the section below.
KMeans.fit(X)
: Runs the K Means algorithm.
A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm: https://arxiv.org/abs/1209.1960
In Search of Deterministic Methods for Initializing K-means and Gaussian mixture Clustering: https://www.researchgate.net/publication/220571343_In_search_of_deterministic_methods_for_initializing_K-means_and_Gaussian_mixture_clustering
A New Initialization Technique for Generalized Lloyd Iteration: https://ieeexplore.ieee.org/document/329844