Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelised Kmeans_rcpp with OpenMP #19

Closed
alazarolop opened this issue Oct 31, 2019 · 4 comments
Closed

Parallelised Kmeans_rcpp with OpenMP #19

alazarolop opened this issue Oct 31, 2019 · 4 comments

Comments

@alazarolop
Copy link

alazarolop commented Oct 31, 2019

Hi,

After following your advice, I was able to build the package with OpenMP support. So far, Clara_Medoids and all the functions with the variable threads defined parallelize its computation, which is awesome!.

Reading your vignette, I found this information for Kmeans_rcpp and parallelization.

It allows for multiple initializations (which can be parallelized if Openmp is available)

But I can't get it working. If I also try the threads variable, it's said it's unused.

How could I get Kmeans_rcpp and MiniBatchKmeans to run its initializations in parallel?

Thank you in advance and congratulations for a great package.

@alazarolop alazarolop changed the title Parallelise Kmeans_rcpp with OpenMP Parallelised Kmeans_rcpp with OpenMP Oct 31, 2019
@mlampros
Copy link
Owner

Hi @alazarolop,

I'm glad you find ClusterR helpful for your tasks.

I totally forgot to remove this line from the documentation when in version 1.0.8 I removed the threads parameter from the Kmeans_rcpp function.
Parallelization is feasible only if the threads parameter is present in the parameter setting of a function.
On the other hand you must know that the Armadillo library which the ClusterR utilizes through the RcppArmadillo package uses OpenMP internally too, but on the contrary, it automatically optimizes the execution of code (where ever possible) if the Operating System of the user supports OpenMP and is enabled (you can find more information in the documentation of the Armadillo library).

Thanks for making me aware of the mistake in the documentation. I'll upload an updated version of the package on Github and I'll fix it in the next version on CRAN.

@alazarolop
Copy link
Author

Hi @mlampros , no worries, you're welcome. Thank you for the quick fix.

That's what I thought and I even tried to use the parameter threads (which just raised an error).

I don't know much about Armadillo to be honest, but I understand what you mean. Could it be possible to force the function to parallelize it? I've got the impression my installation of Armadillo (from Homebrew) it's not using OpenMP, because even with high dimension matrix it just run in a single core.

@mlampros
Copy link
Owner

mlampros commented Nov 4, 2019

HI @alazarolop,

the Kmeans_rcpp function will run on a single core, there is no possibility to force the parallelization of the initializations. On the other hand the Armadillo functions that the Kmeans_rcpp calls internally will be parallelized if OpenMP is enabled for these internal functions.

@alazarolop
Copy link
Author

Ah ok, that was exactly my thought.

Thank you a lot for the explanation and thank you again for your effort on the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants