Implementation of distributed k-means clustering in Python. It uses Single-Shot Decentralized LLoyd.
Clustering is parametrized using env MODEL_PARAM_n_clusters
, but the final number of clusters is also influenced
by the number of nodes - total number of output clusters is floor(n_clusters * n_nodes / 2)
.
It has two modes
compute --mode intermediate
compute --mode aggregate --job-ids 1 2 3
Intermediate mode calculates clusters on a single node, while aggregate mode is merging the clusters according to least merging error (e.g. smallest distance between centroids).
Run: ./build.sh
Run: captain test
Run: ./publish.sh
WARNING: unit tests can fail nondeterministically on AttributeError: can't set attribute
because of some error
in Titus port to Python 3
Create symlink from python-distributed-kmeans
to mip_helper
module from python-mip
ln -s ~/projects/python-base-docker-images/python-mip/mip_helper/mip_helper mip_helper
Run unit tests
find . -name \*.pyc -delete
(cd tests; docker-compose run test_suite -x --ff --capture=no)