You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am trying to run a gpt2 model with blocksize 2048, and I cannot use batchsize larger than 16 because activation memory becomes too large.
To reduce activation memory I already use deepspeed actication checkpointing on each transformer block +amp.
I saw there is an option to partition / shard activations too, advertized by megatron. But when I try it I see no effect at all.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi, I am trying to run a gpt2 model with blocksize 2048, and I cannot use batchsize larger than 16 because activation memory becomes too large.
To reduce activation memory I already use deepspeed actication checkpointing on each transformer block +amp.
I saw there is an option to partition / shard activations too, advertized by megatron. But when I try it I see no effect at all.
The text was updated successfully, but these errors were encountered: