[bloom] Add kv cache support for flash attention & fix bugs #7735

w5688414 · 2023-12-27T09:21:27Z

PR types

PR changes

Description

TODO:

精度定位
Add CI itest

paddle-bot · 2023-12-27T09:21:31Z

Thanks for your contribution!

codecov · 2023-12-27T09:57:17Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (5de7e57) 57.29% compared to head (a5f3286) 57.30%.
Report is 5 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/peft/prefix/utils.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #7735   +/-   ##
========================================
  Coverage    57.29%   57.30%           
========================================
  Files          584      584           
  Lines        87646    87628   -18     
========================================
- Hits         50219    50215    -4     
+ Misses       37427    37413   -14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

JunnYu

LGTM

…nto pip90

JunnYu

LGTM

wj-Mcat · 2023-12-29T06:09:38Z

paddlenlp/peft/prefix/utils.py

@@ -17,10 +17,10 @@

 def bloom_postprocess_past_key_value(past_key_values):
    # (layer_num, bs, head_num/tensor_parallel_degree, prefixlen, head_dim)*2
-    past_key_values = paddle.transpose(past_key_values, perm=[2, 0, 3, 1, 4]).split(2)
+    keys, values = paddle.transpose(past_key_values, perm=[2, 0, 1, 3, 4]).split(2)


这块 @lugimzzz 看看呗，之前用 bloom 训过 ptuning，精度是对齐的，如果这块调整之后是否会影响目前对齐的版本，前端的推理是否也需要调整呢？

训练测试过就可以，确认不影响推理

wj-Mcat · 2023-12-29T06:13:25Z

tests/fixtures/llm/predictor.yaml

@@ -3,6 +3,7 @@ inference-predict:
    mode: dynamic 
    max_length: 40
    batch_size: 2
+    use_flash_attention: false


这块的配置是否要设置成 true 呢？

我单独写了一个单测，在单测里面加了use_flash_attention：true的配置

https://github.com/PaddlePaddle/PaddleNLP/pull/7735/files#diff-378fff328c26822fbce1c8f410ab466ca2b2b9f47b37167a1159de1ac67f3f31R81

JunnYu

LGTM

* Add kv cache support for flash attention * Update chatglm flash attention version check * Add test for flash attention * Fix unitest bug * Add flash attention to predictor * Add flash attention2 * Add flash attention unitests * fix prefix decoder * remove unused comments * Update unitest * Update unitest

Add kv cache support for flash attention

887a201

JunnYu previously approved these changes Dec 28, 2023

View reviewed changes

Update chatglm flash attention version check

a899fce

w5688414 dismissed JunnYu’s stale review via a899fce December 28, 2023 06:49

Add test for flash attention

3cbd143

w5688414 self-assigned this Dec 28, 2023

w5688414 requested review from wawltor and wj-Mcat December 28, 2023 07:16

w5688414 added 4 commits December 28, 2023 16:06

Fix unitest bug

8b7bea1

Add flash attention to predictor

fb193b5

Add flash attention2

d992d42

Add flash attention unitests

02f07e9

w5688414 changed the title ~~[bloom] Add kv cache support for flash attention~~ [bloom] Add kv cache support for flash attention & fix bugs Dec 28, 2023

w5688414 added 3 commits December 28, 2023 22:25

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

0734019

…nto pip90

fix prefix decoder

60b656e

remove unused comments

731b551

JunnYu previously approved these changes Dec 29, 2023

View reviewed changes

wj-Mcat reviewed Dec 29, 2023

View reviewed changes

Update unitest

6639388

w5688414 dismissed JunnYu’s stale review via 6639388 December 29, 2023 07:31

Update unitest

a5f3286

JunnYu approved these changes Dec 29, 2023

View reviewed changes

w5688414 merged commit fb8f2be into develop Dec 29, 2023
10 of 11 checks passed

ZHUI mentioned this pull request Jan 2, 2024

PaddleNLP 2.7.0 Release Note Candidate #7753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bloom] Add kv cache support for flash attention & fix bugs #7735

[bloom] Add kv cache support for flash attention & fix bugs #7735

w5688414 commented Dec 27, 2023 •

edited

Loading

paddle-bot bot commented Dec 27, 2023

codecov bot commented Dec 27, 2023 •

edited

Loading

JunnYu left a comment

JunnYu left a comment

wj-Mcat Dec 29, 2023

lugimzzz Dec 29, 2023

wj-Mcat Dec 29, 2023

w5688414 Dec 29, 2023 •

edited

Loading

JunnYu left a comment

[bloom] Add kv cache support for flash attention & fix bugs #7735

[bloom] Add kv cache support for flash attention & fix bugs #7735

Conversation

w5688414 commented Dec 27, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Dec 27, 2023

codecov bot commented Dec 27, 2023 • edited Loading

Codecov Report

JunnYu left a comment

Choose a reason for hiding this comment

JunnYu left a comment

Choose a reason for hiding this comment

wj-Mcat Dec 29, 2023

Choose a reason for hiding this comment

lugimzzz Dec 29, 2023

Choose a reason for hiding this comment

wj-Mcat Dec 29, 2023

Choose a reason for hiding this comment

w5688414 Dec 29, 2023 • edited Loading

Choose a reason for hiding this comment

JunnYu left a comment

Choose a reason for hiding this comment

w5688414 commented Dec 27, 2023 •

edited

Loading

codecov bot commented Dec 27, 2023 •

edited

Loading

w5688414 Dec 29, 2023 •

edited

Loading