Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU training with PyTorch lightning #3171

Closed
vitkl opened this issue Jan 10, 2023 · 2 comments
Closed

Multi-GPU training with PyTorch lightning #3171

vitkl opened this issue Jan 10, 2023 · 2 comments

Comments

@vitkl
Copy link
Contributor

vitkl commented Jan 10, 2023

I and @macwiatrak are trying to figure out how to train a Pyro / scvi-tools model on multiple GPUs using PyTorch lightning.

I tried PyTorch Lightning Trainer(strategy="horovod", accelerator="GPU", devices=2) with Pyro HorovodOptimizer - however, I am getting ValueError: Tensor is required to be contiguous. which doesn't really suggest what to do next.

Also, https://github.com/pyro-ppl/pyro/blob/dev/examples/svi_horovod.py fails for me on the LSF cluster because it fails to find certain environmental variables.

Would be great to get some help figuring out what's needed to "natively" train pyro models on multiple GPU using PyTorch Lightning horovod or any other strategy.

We can use https://github.com/BayraktarLab/cell2location as a public test case that should have most of the properties relevant to our current and future projects.

@fritzo

Here is what @adamgayoso thinks about scvi-tools + PyTorch lightning context: scverse/scvi-tools#1226 (comment)

@ordabayevy
Copy link
Member

I think this can be closed via #3189. @vitkl I haven't tried using "horovod" strategy but Pyro models seem to work well with "ddp" strategy.

@vitkl vitkl closed this as completed May 13, 2023
@vitkl
Copy link
Contributor Author

vitkl commented May 13, 2023

DDP also works for me - the key was implementing a custom distributed batch sampler rather than using default distributed sample from PyTorch lightning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants