-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9271 +/- ##
===========================================
+ Coverage 52.80% 52.89% +0.08%
===========================================
Files 660 660
Lines 106869 106929 +60
===========================================
+ Hits 56434 56561 +127
+ Misses 50435 50368 -67 ☔ View full report in Codecov by Sentry. |
f3b2d99
to
7551730
Compare
else: | ||
return 8192 # Maximum sequence length. | ||
total_max_length: int = field( | ||
default=4096, metadata={"help": "Super parameter. Maximum sequence length(encoder+decoder)."} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个跟npu相关同学确认的吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已确认,没问题。
@@ -520,7 +520,7 @@ def _preprocess(self, source): | |||
alibi_slopes = llm_utils.get_alibi_slopes(self.model_config.n_head) | |||
inputs["position_ids"] = paddle.to_tensor(alibi_slopes, dtype="float32") | |||
arange_tensor_encoder = paddle.arange(self.config.total_max_length, dtype=self.config.dtype) | |||
alibi = alibi_slopes[None, :, None, None] * arange_tensor_encoder | |||
alibi = (alibi_slopes[None, :, None, None] * arange_tensor_encoder).astype(self.config.dtype) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm,这个 config dtype保险吗?用户可以改这个值。要不用里面一个tensor的dtype。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个dtype确实需要与config.dtype保持一致的
|
||
model = Model.from_pretrained( | ||
predictor_args.total_max_length = config.seq_length | ||
if predictor_args.block_attn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm,我建议吧 block_attn 放到config的属性里面,然后 ChatGLMv2InferenceModel 里面自己控制。
这里改的话,后期这样修改的模型太多了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
但严格来说其实这个不属于每个模型的Config,如果加入如LlamaConfig的话,每个模型的Config里都需要加,先保持这样吧,后面重构的时候,会看下有没有更好的方式
PR types
New features
PR changes
Models
Description