Gradient Fuse Allreduce #45643

JZ-LIANG · 2022-09-01T07:47:10Z

PR types

Performance optimization

PR changes

Others

Describe

Inplace Gradient Coalesce for fuse allreduce.
Avoid memory copy and temporary gradient buffer.

Bugfix for FP16 Pass initialization for param.

…t-embedding-support-3d-input

…overlap_support_weight_sharding

…utoParallel/dp_allreduce_fuse

paddle-bot · 2022-09-01T07:47:16Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…allreduce_fuse

aoyulong

LGTM

* bugfix (PaddlePaddle#45332) * dist embedding support lookup table v1 * add unitest * customize wait_comm * group gradients * bugfix * update program

* [AutoParallel] adapt gradient merge pass (#45915) * adapt gradient merge * fix op_role * fix strategy * [Auto Parallel] Gradient Fuse Allreduce (#45643) * bugfix (#45332) * dist embedding support lookup table v1 * add unitest * customize wait_comm * group gradients * bugfix * update program * [Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com> * [Auto Parallel] Bugfix allreduce fuse for MP (#46086) * bugfix * bugfix * typos fixed * update strategy (#46138) Co-authored-by: zhaoyingli <86812880+zhaoyinglia@users.noreply.github.com> Co-authored-by: JZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: zhaoyingli <zhaoyingli@baidu.com> Co-authored-by: caozhou <caozhou@radi.ac.cn> Co-authored-by: caozhou <48191911+Caozhou1995@users.noreply.github.com>

JZ-LIANG added 9 commits August 24, 2022 10:08

bugfix (PaddlePaddle#45332)

b1007f2

dist embedding support lookup table v1

a3e4a2f

add unitest

27b50e8

Merge remote-tracking branch 'upstream/develop' into AutoParallel/dis…

7a1662a

…t-embedding-support-3d-input

customize wait_comm

ba35eeb

group gradients

ea24b07

Merge remote-tracking branch 'upstream/develop' into AutoParallel/dp_…

bc9e802

…overlap_support_weight_sharding

Merge branch 'AutoParallel/dp_overlap_support_weight_sharding' into A…

5577126

…utoParallel/dp_allreduce_fuse

bugfix

13e31b5

JZ-LIANG and others added 11 commits September 2, 2022 09:36

update program

8ca8a38

disable when sharding

06f467d

engine with profile

2fb0e28

improve recompute ckpts

04ec849

update fp16 pass

3f5207b

bugfix

8d148c2

bugfix

af1c77d

bugfix

02d2057

remvoe local changed

3c07d13

Merge remote-tracking branch 'upstream/develop' into AutoParallel/dp_…

ff6c597

…allreduce_fuse

enable fuse

5aee3f1

aoyulong approved these changes Sep 14, 2022

View reviewed changes

JZ-LIANG merged commit 201d99d into PaddlePaddle:develop Sep 14, 2022

JZ-LIANG deleted the AutoParallel/dp_allreduce_fuse branch September 14, 2022 05:57

JZ-LIANG changed the title ~~[Auto Parallel] Gradient Fuse Allreduce~~ Gradient Fuse Allreduce Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Fuse Allreduce #45643

Gradient Fuse Allreduce #45643

JZ-LIANG commented Sep 1, 2022 •

edited

Loading

paddle-bot bot commented Sep 1, 2022

aoyulong left a comment

Gradient Fuse Allreduce #45643

Gradient Fuse Allreduce #45643

Conversation

JZ-LIANG commented Sep 1, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Sep 1, 2022

aoyulong left a comment

Choose a reason for hiding this comment

JZ-LIANG commented Sep 1, 2022 •

edited

Loading