-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPU][LLM] add README & reformat llama scripts #8642
[NPU][LLM] add README & reformat llama scripts #8642
Conversation
Thanks for your contribution! |
c511f7c
to
5419849
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8642 +/- ##
===========================================
+ Coverage 55.80% 55.81% +0.01%
===========================================
Files 620 620
Lines 96642 96599 -43
===========================================
- Hits 53928 53917 -11
+ Misses 42714 42682 -32 ☔ View full report in Codecov by Sentry. |
LGTM |
llm/llama/npu/README.md
Outdated
@@ -0,0 +1,198 @@ | |||
## 🚣♂️ 使用PaddleNLP在NPU下跑通llama2-13b模型 🚣 | |||
PaddleNLP在昇腾NPU([了解昇腾](https://www.hiascend.com/zh/ecosystem/industry))上对llama2-13B模型进行了深度适配和优化,该套件实现了昇腾NPU和GPU的训推入口基本统一,达到了『无缝切换』的效果。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/npu/llama
文件放到 llm/npu 下面,我们重构的pr已经合入了
llm/llama/npu/README.md
Outdated
} | ||
``` | ||
为了保障极致压缩的推理成本,我们使用了静态图实现。因此需要从训练产出的动态图模型中导出静态图模型,执行如下命令进行导出: | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为
cd PaddleNLP/llm
python predict/export_model.py --model_name_or_path merged_model --inference_model --output_path ./inference --dtype float16 --device npu --block_attn
当前看必须在 PaddleNLP/llm 才跑得通
llm/llama/npu/README.md
Outdated
``` | ||
最终,我们通过静态图的模型执行推理: | ||
``` | ||
# 执行推理代码 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为
# 执行推理代码
python predict/predictor.py --model_name_or_path ./inference --inference_model --dtype "float16" --mode "static" --block_attn --device npu
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/predict/export_model.py#L104 这里帮忙改成
|
43c883d
to
f9f424c
Compare
llm/export_npu.sh
Outdated
source /usr/local/Ascend/ascend-toolkit/set_env.sh | ||
source /usr/local/Ascend/atb/set_env.sh | ||
|
||
export PYTHONPATH=../:$PYTHONPATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
刚到 llm/npu
llm/predict_npu.sh
Outdated
|
||
model_path=${1:-"./inference"} | ||
|
||
source /usr/local/Ascend/ascend-toolkit/set_env.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
放到llm/npu
f9f424c
to
7ab4a98
Compare
PR types
Others
PR changes
Others
Description
add README for llama training/inference on NPU(910)