Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get UMAP to be distributed with DASK #3991

Closed
Tracked by #4139
nono9212 opened this issue Jun 16, 2021 · 6 comments
Closed
Tracked by #4139

Can't get UMAP to be distributed with DASK #3991

nono9212 opened this issue Jun 16, 2021 · 6 comments
Labels
Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. inactive-90d question Further information is requested

Comments

@nono9212
Copy link

Hi,

I am trying to get UMAP to run on two GPU of my setup. However, despite following the doc here I can't get the distribution part to work. I changed this line to be
cluster = LocalCUDACluster(threads_per_worker=1) cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES="8,9")
But when creating the UMAP model, it loads on the first GPU (seen on nvidia-gpu)
local_model = UMAP() # This loads the first GPU

Here is how I coded the rest :

model= cumlUMAP( n_neighbors=200,
    n_components=2,
    min_dist=0.05,
    verbose=True)
DISTmodel = daskUMAP(
    client=client,
    model = um
)

result = DISTmodel.fit_transform(data)

So I can't get the calculations to happen on GPU 8,9
Is that possible? I am not sure that what I am looking for is already included...
Thanks for any advice!

@hcho3 hcho3 added Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. question Further information is requested labels Jun 16, 2021
@viclafargue
Copy link
Contributor

viclafargue commented Jun 16, 2021

Hi @nono9212, thanks for opening the issue.

The distributed UMAP works in the following way :

  1. A single-GPU UMAP model is first trained with a representative sample of the dataset
  2. This model is then broadcasted to all workers allowing distributed inference of a larger dataset

The single-GPU instance of UMAP is not connected to the Dask client (nor cluster) and will always default to the first visible GPU.
To specify the GPUs you would like to use you can either, do it from command line : CUDA_VISIBLE_DEVICES=8,9 python script.py
Or from the Python code: os.environ["CUDA_VISIBLE_DEVICES"]="8,9" (before training local UMAP model)

@nono9212
Copy link
Author

Great thanks a lot for the explanation. Is there any way of freeing the memory when not using a client?

@viclafargue
Copy link
Contributor

CuPy/cuDF/Numba allocations like other objects are enlisted to be released by the Python garbage collector as soon as they are out of scope or explicitly destroyed. It is true though that the garbage collector takes care of host memory and not so much of the memory that the objects are pointing to on GPU. Because of this GPU memory might not be released on time for new allocations.

In order to force the release of the memory, you can do the following:

import gc
del obj # Can be a single-GPU UMAP model here for instance
gc.collect() # Releases local memory

You can even do the same on a Dask cluster: client.run(gc.collect) # Releases memory on cluster.
Hope it answers your question

@cjnolet
Copy link
Member

cjnolet commented Jul 23, 2021

@nono9212,

One thing to consider when using Dask in a GPU environment is that your client process might be sharing a GPU with one of the workers. You can get around this by explicitly setting it to use a different GPU.

The Python garbage collector should free up any stray objects on host which, if they were holding allocations to GPU memory, should also clean up the GPU memory. The thread in #4068 might be relevant to this discussion as well.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@viclafargue
Copy link
Contributor

Closing the issue. Please don't hesitate to re-open if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. inactive-90d question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants