Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auto Parallel] fix enable_delay_scale_loss for static auto parallel … #68525

Merged

Conversation

zhangyuqin1998
Copy link
Contributor

@zhangyuqin1998 zhangyuqin1998 commented Sep 29, 2024

PR Category

Auto Parallel

PR Types

Bug fixes

Description

修复动静半自动并行中,针对enable_delay_scale_loss的行为。在自动并行中,默认使用enable_delay_scale_loss的逻辑。
动态图手动的enable_delay_scale_loss的逻辑中,会先在sp/dp/sharding并行组对grad进行规约,再对规约的结果除以acc的step数。但目前自动并行的实现中,先对每个add的grad除以acc的step数,再在sp/dp/sharding并行组对grad进行规约。这种做法在grad较小时会有数值精度损失的风险。

因此,本pr:
(1)适配动态图自动并行的逻辑,强制在优化器更新前触发通信,然后对梯度进行scale
(2)适配静态图自动并行的逻辑,适配auto_parallel_gradient_merge_pass,将grad的scale移动到reduce通信后进行

Pcard-76459

Copy link

paddle-bot bot commented Sep 29, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhangyuqin1998 zhangyuqin1998 force-pushed the fix_enable_delay_scale_loss branch from daab76a to 1bed945 Compare October 1, 2024 04:48
JZ-LIANG
JZ-LIANG previously approved these changes Oct 8, 2024
Copy link
Contributor

@JZ-LIANG JZ-LIANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -636,6 +636,94 @@ def parse_program(
return grad_to_gradient_merge


def _find_trival_optimizer_ops(block):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里只用 name string 判断 optimizer op 未来很容易遗漏,后续可能想一下用 一个 固定 opt_op_name_list 统一维护。

@zhangyuqin1998 zhangyuqin1998 force-pushed the fix_enable_delay_scale_loss branch from 5c6d75d to d7c8913 Compare October 8, 2024 08:44
Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeff41404
Copy link
Contributor

单独看这个API,增加默认值参数是兼容性升级
但是最好看看套件中现有使用方式是否需要同步修改,比如 paddlenlp 的 auto_trainer.py 中,既调用了shard_optimizer(如果不修改调用方式则默认 gradient_accumulation_steps=1),但是如果用户启动训练时同时也传入了非1的gradient_accumulation_steps,是否就会有影响

@zhangyuqin1998
Copy link
Contributor Author

zhangyuqin1998 commented Oct 12, 2024

单独看这个API,增加默认值参数是兼容性升级 但是最好看看套件中现有使用方式是否需要同步修改,比如 paddlenlp 的 auto_trainer.py 中,既调用了shard_optimizer(如果不修改调用方式则默认 gradient_accumulation_steps=1),但是如果用户启动训练时同时也传入了非1的gradient_accumulation_steps,是否就会有影响

在auto_trainer中也做了对应的修改:PaddlePaddle/PaddleNLP#9217 ,用户如果自己调用,也不会遇到问题

@jeff41404
Copy link
Contributor

单独看这个API,增加默认值参数是兼容性升级 但是最好看看套件中现有使用方式是否需要同步修改,比如 paddlenlp 的 auto_trainer.py 中,既调用了shard_optimizer(如果不修改调用方式则默认 gradient_accumulation_steps=1),但是如果用户启动训练时同时也传入了非1的gradient_accumulation_steps,是否就会有影响

在auto_trainer中也做了对应的修改:PaddlePaddle/PaddleNLP#9217 ,用户如果自己调用,也不会遇到问题

ok, thanks

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾 for type annotation update

@From00 From00 merged commit 88d4de6 into PaddlePaddle:develop Oct 12, 2024
27 checks passed
zhangyuqin1998 added a commit to zhangyuqin1998/Paddle that referenced this pull request Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants