K-Means clustering

Luca Todaro and Rasha Zieni, University of Pavia

this repository contains the code for our Advanced Computer Architecture project work: an implementation of serial and parallel k-means clustering algorithm, written in C++. Oh and by the way, we got A+. :D

Project description

we are running the kmeans algorithm both in C++ (our own serial and parallel implementation) as well as in Python using the serial implementation of Scikit-Learn. The aim of the project is to demonstrate the performance increase when switching from serial to parallel programming, plus we decided to spice the things up comparing how fast the serial C++ implementation is against the one in Python from a well-known library (so we assume it's a state of art implementation!).

Because the algorithm is intrinsically non-deterministic, we decided to start the algorithm with pre-defined datasets to decrease variance between experiments. Please check the folder input for more infos about the datasets we used.

Run the code on Google Cloud Platform

We included an easy script (that is, prepare-project.sh) that helps configuring the environment on your virtual machine. All you need to do is to install git with

sudo apt-get install git

and then clone this repo:

git clone https://github.com/A7F/aca-kmeans.git

move to the newly created folder with cd aca-kmeans and assign run permissions to the script prepare-project.sh with

sudo chmod 777 ./prepare-project.sh

Please remember to set the correct working directory inside the script you want to run first! For example, if you want to run the serial script, paste your path inside the variable base_dir. After that, the bash script we provided automatically builds the project and prepares the runnable file in the same directory, so all you need to do is to call ./aca_kmeans.

Some useful resources...

follows a small section of useful links and resources to keep in hand, you know... just in case...

project presentation project report

Course links

aca project work page and course index

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
input		input
output		output
plots		plots
resources		resources
scripts		scripts
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
cluster.h		cluster.h
parallel.cpp		parallel.cpp
point.h		point.h
prepare-project.sh		prepare-project.sh
serial.cpp		serial.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means clustering

Luca Todaro and Rasha Zieni, University of Pavia

Project description

Run the code on Google Cloud Platform

Some useful resources...

Course links

About

Releases

Packages

Contributors 2

Languages

A7F/aca-kmeans

Folders and files

Latest commit

History

Repository files navigation

K-Means clustering

Luca Todaro and Rasha Zieni, University of Pavia

Project description

Run the code on Google Cloud Platform

Some useful resources...

Course links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages