Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] Add cutlass gemm dequant op #8909

Merged
merged 21 commits into from
Aug 29, 2024

Conversation

gzy19990617
Copy link
Contributor

@gzy19990617 gzy19990617 commented Aug 9, 2024

PR types

New features

PR changes

Add new cutlass op

Description

Add cutlass gemm dequant op

  1. 精度测试
    参数:--decode_strategy greedy_search --mode dynamic --quant_type a8w8 --inference_model 1 --batch_size 2 --src_length 128 --max_length 256
    Use block atta输出:
image

Not use block atta输出:(不增加该PR时,第二条输出就有乱码)
image

  1. 性能测试:平均耗时44.9ms -> 42.6ms
    测试配置 L20 、batch_size 2、block atta
    gemm dequant 未融合:
Pasted Graphic 3

gemm dequant 融合:
Pasted Graphic 4

3.尝试qkv_out后接dequant,但出现报错
详细见这里:
https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/TK3hw_mluo/1-4J_hgwU8mmJN

Copy link

paddle-bot bot commented Aug 9, 2024

Thanks for your contribution!

Copy link

codecov bot commented Aug 9, 2024

Codecov Report

Attention: Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.

Project coverage is 53.76%. Comparing base (a18e220) to head (aa0fdd0).
Report is 216 commits behind head on develop.

Files with missing lines Patch % Lines
...erimental/transformers/fused_transformer_layers.py 0.00% 13 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8909      +/-   ##
===========================================
+ Coverage    53.58%   53.76%   +0.18%     
===========================================
  Files          652      652              
  Lines       105169   104513     -656     
===========================================
- Hits         56354    56193     -161     
+ Misses       48815    48320     -495     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

csrc/gpu/cutlass_kernels/gemm_dequant.cu Outdated Show resolved Hide resolved
csrc/gpu/cutlass_kernels/gemm_dequant.cu Outdated Show resolved Hide resolved
Copy link
Collaborator

@DrownFish19 DrownFish19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DrownFish19 DrownFish19 changed the title Add cutlass gemm dequant op [Inference] Add cutlass gemm dequant op Aug 29, 2024
@wawltor wawltor merged commit c28caf7 into PaddlePaddle:develop Aug 29, 2024
10 of 12 checks passed
Mangodadada pushed a commit to Mangodadada/PaddleNLP that referenced this pull request Sep 10, 2024
* change gpu name

* add cutlass gemm_dequant op

* add cutlass gemm_dequant op

* fix format

* fix fused_transformer_layers

* fix layer

* fix layer

* fix format

* fix format

* fix review

* fix review

* fix review

* fix review

* fix review

* fix review

* fix review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants