-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support uneven DDP inputs with pytorch model.join #3325
Comments
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
Interested in this issue! Hopefully some progress is done soon 👍 |
Interested in this also :) |
Is there any progress on this issue? Happy to help in any way. |
@rohan-varma that would be great!! Want to try and submit a draft PR? And we can help from there? |
@edenlightning Sounds good, I also pinged the slack channel for any feedback/discussions. |
We'd also be very interested in this feature. Let us know if there's anything I can do to help! |
The PR #5141 is ready for review, in case anyone wants to take a look. |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
I discussed this more with @rohan-varma - DDP join docs: https://pytorch.org/docs/stable/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel.join
As the LightningModule is wrapped in another module which is then wrapped with DDP, the LightningModule's As a result, any collective call (such as metric syncing or |
agree, I also don't see how this can be supported at the moment. |
To recap, the plan would be:
Some sources: |
I assume there is, if collectives are being used. For example,
pytorch/pytorch#49180 is great! Hopefully this will clarify the drop_last argument which has a slightly misleading/incomplete description :) We would indeed need the UnrepeatedDistributedSampler. |
Hi, everyone, I'm gathering information on what is needed in order to support this properly.
Is that it? cc @awaelchli, @carmocca. |
@otaj Almost. Additionally, all metrics from |
oh, those |
if we can capture user calls with that, it might work similarly with |
let's check the option with LigthingLite first 🦦 |
Here is the corresponding issue as suggested in planning: #14635 |
hi, any updates of this issue? |
Is there a (hacky) method for implementing the PyTorch |
Any updates on this issue? |
See more details: pytorch/pytorch#38174
cc @Borda @tchaton @rohitgr7 @akihironitta @awaelchli
The text was updated successfully, but these errors were encountered: