-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update llm infer docs #9314
update llm infer docs #9314
Conversation
Thanks for your contribution! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #9314 +/- ##
===========================================
- Coverage 53.44% 52.41% -1.04%
===========================================
Files 664 661 -3
Lines 109935 108376 -1559
===========================================
- Hits 58757 56801 -1956
- Misses 51178 51575 +397 ☔ View full report in Codecov by Sentry. |
llm/docs/predict/inference.md
Outdated
@@ -94,6 +95,8 @@ PaddleNLP 提供了多种参数,用于配置推理模型和优化推理性能 | |||
|
|||
- `block_attn`: 是否使用 Block Attention 推理, 默认值为False。Block Attention 是基于 PageAttention 的思想设计并实现的,在保持高性能推理和动态插入的基础上可以动态地为 cachekv 分配存储空间,极大地节省显存并提升推理的吞吐。 | |||
|
|||
- `append_attn`: Append Attention 在 Block Attention 实现的基础上,进一步借鉴 FlashInfer 的实现对 Attention 模块进行了优化,并增加了C4的高性能支持,极大地提升了推理性能。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最好说明一下两者的关系,是二选一,还是叠加?
建议可以说一下 append_attn 的优势是什么?哪些场景下更合适,这里只说借鉴 FlashInfer,但是用户可能不知道FlashInfer是什么。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
二选一,全场景下应该都合适,属于是block_attn的升级版
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
辛苦补充一下到文档里吧,主要是让用户看懂,看明白。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -43,27 +43,27 @@ BF16推理 | |||
|
|||
```shell | |||
# 动态图推理 | |||
python ./predict/predictor.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --dtype bfloat16 --mode dynamic --inference_model 1 --block_attn 1 | |||
python ./predict/predictor.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --dtype bfloat16 --mode dynamic --inference_model 1 --append_attn 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文档是直接对外,append_attn 相关功能的CI也辛苦加一下,避免用户跑不通的情况。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI有些问题,已经在调试了,在下个PR中会提交
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Docs
Description
update llm infer docs