[litgpt benchmark] enable `force_recompute_fp8_weight_in_bwd` when `torchao.float8` is used with FSDP2 #1528

crcrpar · 2024-12-09T10:36:45Z

What does this PR do?

As per title.

ref: pytorch/ao@e919558

With 8 H100s and pjnl-20241209.
Used command is: torchrun --nproc-per-node 8 --local-ranks-filter 0 --role rank --tee 3 thunder/benchmarks/benchmark_litgpt.py --model_name <MODEL_NAME> --compile inductor --distributed_mode fsdp2 --shard_mode zero2 --use_torchao_fp8_linear true --use_torchao_fp8_allgather true --use_torchao_fp8_precompute_scale_for_fsdp true

Llama-2-7b-hf.

branch	perf (tokens/s/gpu)	mem usage (GB)
main	13947.29	34.26
this PR	13995.80	27.69

Llama-3-8B

branch	perf (tokens/s/gpu)	mem usage (GB)
main	12404.18	58.65
this PR	12414.15	51.67

cc @crcrpar

ref: pytorch/ao@e919558 Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar requested review from mruberry, lantiga and t-vi as code owners December 9, 2024 10:36

crcrpar added the benchmarking label Dec 9, 2024

IvanYashchuk approved these changes Dec 17, 2024

View reviewed changes

enable force_recompute_fp8_weight_in_bwd

9ad3327

ref: pytorch/ao@e919558 Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar force-pushed the crpa/ao-recompute_fp8_weight_in_bwd branch from a824ae4 to 9ad3327 Compare December 17, 2024 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[litgpt benchmark] enable `force_recompute_fp8_weight_in_bwd` when `torchao.float8` is used with FSDP2 #1528

[litgpt benchmark] enable `force_recompute_fp8_weight_in_bwd` when `torchao.float8` is used with FSDP2 #1528

crcrpar commented Dec 9, 2024 •

edited by github-actions bot

Loading

[litgpt benchmark] enable force_recompute_fp8_weight_in_bwd when torchao.float8 is used with FSDP2 #1528

Are you sure you want to change the base?

[litgpt benchmark] enable force_recompute_fp8_weight_in_bwd when torchao.float8 is used with FSDP2 #1528

Conversation

crcrpar commented Dec 9, 2024 • edited by github-actions bot Loading

What does this PR do?

Llama-2-7b-hf.

Llama-3-8B

[litgpt benchmark] enable `force_recompute_fp8_weight_in_bwd` when `torchao.float8` is used with FSDP2 #1528

[litgpt benchmark] enable `force_recompute_fp8_weight_in_bwd` when `torchao.float8` is used with FSDP2 #1528

crcrpar commented Dec 9, 2024 •

edited by github-actions bot

Loading