[ZeroPadding] revert zero_padding #8973 #9003

DrownFish19 · 2024-08-26T06:05:04Z

PR types

Bug fixes

PR changes

Others

Description

Revert commit [ZeroPadding] padding to max_length for sequence parallel #8973.
Set padding to max_length for sequence_parallel in llm/run_finetune.py.
Update test cases.

paddle-bot · 2024-08-26T06:05:10Z

Thanks for your contribution!

codecov · 2024-08-26T06:37:44Z

Codecov Report

Attention: Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.

Project coverage is 54.14%. Comparing base (56dba6d) to head (6c8ee06).
Report is 234 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/datasets/zero_padding_dataset.py	11.11%	8 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9003      +/-   ##
===========================================
- Coverage    54.78%   54.14%   -0.65%     
===========================================
  Files          647      650       +3     
  Lines       102502   103871    +1369     
===========================================
+ Hits         56160    56237      +77     
- Misses       46342    47634    +1292

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

ZHUI · 2024-08-26T09:20:02Z

paddlenlp/datasets/zero_padding_dataset.py

@@ -53,42 +53,7 @@ class ZeroPadding:
    ]

    @classmethod
-    def _pad_batch_records_to_max_length(cls, batch_records, max_length, pad_token=0):


这段代码是挪走了，还是彻底不需要了？

不需要了，我们使用tokenzier的pad进行补充

ZHUI · 2024-08-26T09:23:35Z

llm/run_finetune.py

+        padding = "max_length"
+    else:
+        max_length = None
+        padding = True


额，我们不padding到最大长度是什么情况？是说，不同batch 最大长度改变？（之前一直反馈这种情况容易显存泄漏。）

但如果不是zero padding没必要pad到最长

sequence_parallel应该是padding到最长，否则中间长度变化可能不能被tensor_parallel_degree整除

lugimzzz

lgtm

lugimzzz · 2024-08-26T09:27:40Z

llm/run_finetune.py

+        padding = "max_length"
+    else:
+        max_length = None
+        padding = True


但如果不是zero padding没必要pad到最长

fix zero_padding

6c8ee06

ZHUI reviewed Aug 26, 2024

View reviewed changes

lugimzzz approved these changes Aug 26, 2024

View reviewed changes

ZHUI merged commit 56d293d into PaddlePaddle:develop Aug 26, 2024
9 of 12 checks passed

DrownFish19 deleted the dev_20240822_fix_zero_padding_dpo branch August 27, 2024 00:52

lixcli pushed a commit to lixcli/PaddleNLP that referenced this pull request Aug 28, 2024

Fix zero_padding (PaddlePaddle#9003)

208d223

Mangodadada pushed a commit to Mangodadada/PaddleNLP that referenced this pull request Sep 10, 2024

Fix zero_padding (PaddlePaddle#9003)

b511697

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZeroPadding] revert zero_padding #8973 #9003

[ZeroPadding] revert zero_padding #8973 #9003

DrownFish19 commented Aug 26, 2024

paddle-bot bot commented Aug 26, 2024

codecov bot commented Aug 26, 2024 •

edited

Loading

ZHUI Aug 26, 2024

DrownFish19 Aug 26, 2024

ZHUI Aug 26, 2024

lugimzzz Aug 26, 2024

DrownFish19 Aug 26, 2024

lugimzzz left a comment

lugimzzz Aug 26, 2024

[ZeroPadding] revert zero_padding #8973 #9003

[ZeroPadding] revert zero_padding #8973 #9003

Conversation

DrownFish19 commented Aug 26, 2024

PR types

PR changes

Description

paddle-bot bot commented Aug 26, 2024

codecov bot commented Aug 26, 2024 • edited Loading

Codecov Report

ZHUI Aug 26, 2024

Choose a reason for hiding this comment

DrownFish19 Aug 26, 2024

Choose a reason for hiding this comment

ZHUI Aug 26, 2024

Choose a reason for hiding this comment

lugimzzz Aug 26, 2024

Choose a reason for hiding this comment

DrownFish19 Aug 26, 2024

Choose a reason for hiding this comment

lugimzzz left a comment

Choose a reason for hiding this comment

lugimzzz Aug 26, 2024

Choose a reason for hiding this comment

codecov bot commented Aug 26, 2024 •

edited

Loading