Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] allow use of MPI for distributed training #3364

Closed
esvhd opened this issue Sep 7, 2020 · 3 comments
Closed

[R-package] allow use of MPI for distributed training #3364

esvhd opened this issue Sep 7, 2020 · 3 comments

Comments

@esvhd
Copy link

esvhd commented Sep 7, 2020

Summary

As @jameslamb suggested in his very helpful replies to #3354 , adding a switch in build_r.R script to enable GPU and MPI would be very helpful.

Motivation

Enable MPI support for R packages. A switch is already available for python builds.

I'll let more capable minds to comment on the pros and cons for MPI vs sockets, and particularly when it comes to training.

Description

Current documentation for R package build might be missing some small details on how to enable MPI. See discussion in #3354 .
@jameslamb went through this and successfully managed to build the package with both GPU and MPI switched on.

@jameslamb
Copy link
Collaborator

Thanks for reporting this! I'll close this for now and add it to #2302 , where we keep all feature requests. Anyone reading this, you're welcome to add this feature!

I also want to add some additional information. The goal of this feature is to provide an experience like:

Rscript build_r.R --use-mpi

That should add -DUSE_MPI=ON to cmake_flags in R-package/src/install.libs.R.

Someone picking this up might also want to explore how this could be used for the CRAN package. A configure argument could be added so people can do something like this.

sh build-cran-package.sh
R CMD INSTALL lightgbm_3.0.0.tar.gz --configure-args='--use-mpi'

You can see more information on how to update the CRAN package at https://github.com/microsoft/LightGBM/blob/master/R-package/README.md#changing-the-cran-package. But getting this working for the CRAN package is not required.

@StrikerRUS
Copy link
Collaborator

I might be wrong, but I cannot find any references from R codebase to LGBM_NetworkInit() cpp function. If that's true, it means exposing --use-mpi flag will be not enough because actually R-package lacks support of parallel learning (even pure socket one). To enable it something similar to

_safe_call(_LIB.LGBM_NetworkInit(c_str(machines),
ctypes.c_int(local_listen_port),
ctypes.c_int(listen_time_out),
ctypes.c_int(num_machines)))

should be done.

@jameslamb
Copy link
Collaborator

jameslamb commented Jan 26, 2021

you're right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants