Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP on the same CNN model requires more memory than DataParallel #1163

Closed
s-reaungamornrat opened this issue Feb 18, 2024 · 0 comments
Closed

Comments

@s-reaungamornrat
Copy link

Hi All,

I converted my model training from DataParallel to FSDP based on PyTorch FSDP documents and an example in (https://github.com/pytorch/examples/blob/main/distributed/FSDP). I made change to model wrapping and distributed data sampling and training (and save and load checkpoints), but I did not make change to how loss calculation, which I think consistence with the documents and the example. However, I faced the issue that FSDP training requires more memory than DataParallel one. Does anyone know how this could happen? I found the thread #633 mentioning updating PyTorch FSDP documents on buffer but I cannot find such updated documents. Could anyone point me to the documents or does anyone know how to properly setup FSDP for CNN so that memory usage is correctly reduced compared to DataParallel? Thank you very much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant