Add disable_tensor_cache=True to HPUGraph capture #252

kzawora-intel · 2024-09-06T14:18:11Z

RuntimeErrors are not observed anymore on habana_main when disable_tensor_cache is used. This PR enables disable_tensor_cache.

…a/pt_compile_only_mode_error

RuntimeErrors are not observed anymore on habana_main when disable_tensor_cache is used. This PR enables disable_tensor_cache.

After #252, HPUGraph capture takes much less memory, and we can reduce the memory reserved for HPUGraphs. On Llama3.1-8b-Instruct (G2), capturing 100% of prefill and decode graphs on BS=256 now takes 1.566 GB of HBM, which is far less than 40% (~30 GB) we reserve by default. This results in lots of unused (==wasted) memory, which could be used instead for more KV cache blocks.

RuntimeErrors are not observed anymore on habana_main when disable_tensor_cache is used. This PR enables disable_tensor_cache.

kzawora-intel added 3 commits September 6, 2024 17:05

Add error handling for PT_COMPILE_ONLY_MODE

7a67f77

Add disable_tensor_cache=True to HPUGraph capture

5d312e3

Merge remote-tracking branch 'origin/habana_main' into private/kzawor…

8df3dc9

…a/pt_compile_only_mode_error

kzawora-intel requested a review from madamczykhabana September 6, 2024 14:18

michalkuligowski approved these changes Sep 10, 2024

View reviewed changes

kzawora-intel merged commit 4052bdb into habana_main Sep 10, 2024
13 checks passed

zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 13, 2024

Add disable_tensor_cache=True to HPUGraph capture (HabanaAI#252)

089a105

RuntimeErrors are not observed anymore on habana_main when disable_tensor_cache is used. This PR enables disable_tensor_cache.

kzawora-intel mentioned this pull request Sep 17, 2024

Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1 #292

Merged

zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024

Add disable_tensor_cache=True to HPUGraph capture (HabanaAI#252)

bfd3d1c

RuntimeErrors are not observed anymore on habana_main when disable_tensor_cache is used. This PR enables disable_tensor_cache.

kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Sep 20, 2024

kzawora-intel deleted the private/kzawora/disable_tensor_cache branch October 7, 2024 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add disable_tensor_cache=True to HPUGraph capture #252

Add disable_tensor_cache=True to HPUGraph capture #252

kzawora-intel commented Sep 6, 2024

Add disable_tensor_cache=True to HPUGraph capture #252

Add disable_tensor_cache=True to HPUGraph capture #252

Conversation

kzawora-intel commented Sep 6, 2024