-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supports llama-dybatch-V1 #6676
Conversation
Thanks for your contribution! |
0b35ec5
to
839692e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得大部分的工作都非常棒,有几点想跟你讨论的。
另外,后面有时间可以也加一加相关单测,目前 paddlenlp 合入进去的相关东西一般都是要加的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要补充使用文档
已添加 |
d43357a
to
59467c6
Compare
llm/llama/dybatch/README.md
Outdated
@@ -0,0 +1,21 @@ | |||
# LLaMA DyBatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前在LLM目录模型的使用方法基本得到统一,微调、预测、量化相关的脚本都是共用一套
动态插入的脚本是否可以得到统一
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分可能不太好统一,后续还有各种量化方法,全部放到一起显得不够清晰,或者在主README里面加一下跳转链接这样呢
llm/llama/dybatch/utils.py
Outdated
@@ -0,0 +1,247 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个建议放到paddlenlp/transformers目录,组织方式 @wj-Mcat 帮忙看下~
llm/llama/dybatch/export_model.py
Outdated
@@ -0,0 +1,147 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个目录下的文件看下能否删掉,在llm/llama目录的文件加一个--enable_dybatch
的参数,通过分支来维护
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确认下是否是在tests/transformer/llama目录下添加单测
309fb6d
to
5897009
Compare
5897009
to
5e31739
Compare
f2bb78d
to
dcb4041
Compare
Codecov Report
@@ Coverage Diff @@
## develop #6676 +/- ##
===========================================
- Coverage 60.85% 60.50% -0.35%
===========================================
Files 534 539 +5
Lines 78870 79322 +452
===========================================
+ Hits 47995 47996 +1
- Misses 30875 31326 +451
|
@@ -0,0 +1,19 @@ | |||
# LLaMA Inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 按llm目录的组织方式,这里.sh文件删掉吧,文档按Python命令方式给一下,区分一下单卡和多卡
- 多卡权重拆分的统一脚本 @wj-Mcat 看下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按llm目录的组织方式,这里.sh文件删掉吧,文档按Python命令方式给一下,区分一下单卡和多卡
我在最新的 commit 当中已经删掉了。
experimental/inference/llama/run.sh
Outdated
export FLAGS_new_executor_serial_run=1 | ||
export FLAGS_allocator_strategy=naive_best_fit | ||
export FLAGS_fraction_of_gpu_memory_to_use=0.95 | ||
export FLAGS_use_cutlass_fmha=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 非必要的flag建议删掉,如log相关
- 其余flag在README里简单说明一下作用
截止目前,完成了 InferenceModel:
|
if paddle.in_dynamic_mode(): | ||
y_is_distributed = y.is_distributed | ||
else: | ||
y_is_distributed = tensor_parallel_degree > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
动态图下 y.is_distributed 为真实值,可是在静态图下y.is_distributed 一直为 False,于是会影响最终 Logits 的维度,从而影响解码的精度。
在此处针对于静态图做了一定的适配。
return None | ||
|
||
|
||
class DygraphInferencePredictor(BasePredictor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的命名后续可以修改一下,之前的理解是dygraph表示动态图,inference表示静态图推理
自己记个TODO吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的命名暂时没有比较优雅合适的名字:
- DygraphInferencePredictor(中庸)
- DygraphinferenceModelPredictor(太长)
- DIPredictor(缩写,什么鬼)
大家有什么合适的名字也可以来参与讨论。
@@ -242,53 +250,296 @@ def _infer(self, inputs: dict[str, np.ndarray]): | |||
return decoded_ids | |||
|
|||
|
|||
def create_predictor(predictor_args: PredictorArgument, model_args: ModelArgument): | |||
class StaticInferencePredictor(BasePredictor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同时这里区分动态batch和非动态batch的了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里有两个 flag:
- mode: dygraph, static
- inference_model: bool 类型
通过以上两个 flag 来控制这四种情况。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dygraph -> dynamic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Models
Description
supports llama-dybatch-V1