Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disable_tensor_cache=True to HPUGraph capture #252

Merged
merged 3 commits into from
Sep 10, 2024

Conversation

kzawora-intel
Copy link

RuntimeErrors are not observed anymore on habana_main when disable_tensor_cache is used. This PR enables disable_tensor_cache.

@kzawora-intel kzawora-intel merged commit 4052bdb into habana_main Sep 10, 2024
13 checks passed
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 13, 2024
RuntimeErrors are not observed anymore on habana_main when
disable_tensor_cache is used. This PR enables disable_tensor_cache.
michalkuligowski added a commit that referenced this pull request Sep 17, 2024
After #252, HPUGraph capture
takes much less memory, and we can reduce the memory reserved for
HPUGraphs. On Llama3.1-8b-Instruct (G2), capturing 100% of prefill and
decode graphs on BS=256 now takes 1.566 GB of HBM, which is far less
than 40% (~30 GB) we reserve by default. This results in lots of unused
(==wasted) memory, which could be used instead for more KV cache blocks.
zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024
RuntimeErrors are not observed anymore on habana_main when
disable_tensor_cache is used. This PR enables disable_tensor_cache.
@kzawora-intel kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Sep 20, 2024
@kzawora-intel kzawora-intel deleted the private/kzawora/disable_tensor_cache branch October 7, 2024 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants