FSDP on the same CNN model requires more memory than DataParallel #1163

s-reaungamornrat · 2024-02-18T16:46:51Z

Hi All,

I converted my model training from DataParallel to FSDP based on PyTorch FSDP documents and an example in (https://github.com/pytorch/examples/blob/main/distributed/FSDP). I made change to model wrapping and distributed data sampling and training (and save and load checkpoints), but I did not make change to how loss calculation, which I think consistence with the documents and the example. However, I faced the issue that FSDP training requires more memory than DataParallel one. Does anyone know how this could happen? I found the thread #633 mentioning updating PyTorch FSDP documents on buffer but I cannot find such updated documents. Could anyone point me to the documents or does anyone know how to properly setup FSDP for CNN so that memory usage is correctly reduced compared to DataParallel? Thank you very much for your help!

s-reaungamornrat closed this as completed Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FSDP on the same CNN model requires more memory than DataParallel #1163

FSDP on the same CNN model requires more memory than DataParallel #1163

s-reaungamornrat commented Feb 18, 2024

FSDP on the same CNN model requires more memory than DataParallel #1163

FSDP on the same CNN model requires more memory than DataParallel #1163

Comments

s-reaungamornrat commented Feb 18, 2024