Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV] Support sync params in tensor parallel config #8311

Conversation

From00
Copy link
Collaborator

@From00 From00 commented Apr 23, 2024

PR types

New features

PR changes

Others

Description

迁移#8306 ,支持配置mp参数强制同步策略。

Copy link

paddle-bot bot commented Apr 23, 2024

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Apr 23, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Apr 23, 2024

Codecov Report

Attention: Patch coverage is 0% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 55.32%. Comparing base (beb433a) to head (ef09db4).
Report is 5 commits behind head on develop.

❗ Current head ef09db4 differs from pull request most recent head 4d6e782. Consider uploading reports for the commit 4d6e782 to get more accurate results

Files Patch % Lines
paddlenlp/trainer/training_args.py 0.00% 12 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8311      +/-   ##
===========================================
- Coverage    55.33%   55.32%   -0.01%     
===========================================
  Files          614      614              
  Lines        95341    95353      +12     
===========================================
  Hits         52753    52753              
- Misses       42588    42600      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@From00 From00 force-pushed the dev-suport-sync-param-name-for-tensor-parallel-configs branch from ef09db4 to 69f7d22 Compare April 23, 2024 13:18
@From00 From00 force-pushed the dev-suport-sync-param-name-for-tensor-parallel-configs branch from 69f7d22 to 4d6e782 Compare April 23, 2024 13:30
Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

请确保paddle正式版 不开启这些个功能的时候,不会报错

Comment on lines +528 to +530
sync_param : 在优化器阶段使用broadcast同步所有is_distributed=False的参数
sync_grad : 在优化器阶段使用broadcast同步所有is_distributed=False的梯度
sync_moment : 在优化器阶段使用broadcast同步所有is_distributed=False的momentum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些不是默认开启的吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PaddleNLP不配置这些参数的时候,不会影响现行逻辑。
框架里,sync_param默认开启,sync_grad和sync_moment默认不开启,且同步的参数名称默认为sync_param_name = ["embedding", "layer_norm", ".b_"],其它参数都不会同步。
PaddleNLP配置开启这些开关的时候,会强制同步所有参数。
在代码里有相关注释说明:https://github.com/PaddlePaddle/PaddleNLP/pull/8311/files#diff-477a2a51a1a5694f5db999c8695a3f6ec8fd4f08ded299fb66176651e9d6ebadR1100-R1102

@From00 From00 requested a review from wawltor April 25, 2024 01:37
@wawltor wawltor merged commit ae06f0c into PaddlePaddle:develop Apr 26, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants