You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I and @macwiatrak are trying to figure out how to train a Pyro / scvi-tools model on multiple GPUs using PyTorch lightning.
I tried PyTorch Lightning Trainer(strategy="horovod", accelerator="GPU", devices=2) with Pyro HorovodOptimizer - however, I am getting ValueError: Tensor is required to be contiguous. which doesn't really suggest what to do next.
Would be great to get some help figuring out what's needed to "natively" train pyro models on multiple GPU using PyTorch Lightning horovod or any other strategy.
DDP also works for me - the key was implementing a custom distributed batch sampler rather than using default distributed sample from PyTorch lightning.
I and @macwiatrak are trying to figure out how to train a Pyro / scvi-tools model on multiple GPUs using PyTorch lightning.
I tried PyTorch Lightning
Trainer(strategy="horovod", accelerator="GPU", devices=2)
with PyroHorovodOptimizer
- however, I am gettingValueError: Tensor is required to be contiguous
. which doesn't really suggest what to do next.Also, https://github.com/pyro-ppl/pyro/blob/dev/examples/svi_horovod.py fails for me on the LSF cluster because it fails to find certain environmental variables.
Would be great to get some help figuring out what's needed to "natively" train pyro models on multiple GPU using PyTorch Lightning horovod or any other strategy.
We can use https://github.com/BayraktarLab/cell2location as a public test case that should have most of the properties relevant to our current and future projects.
@fritzo
Here is what @adamgayoso thinks about scvi-tools + PyTorch lightning context: scverse/scvi-tools#1226 (comment)
The text was updated successfully, but these errors were encountered: