Add mp_all_reduce asynchronize overlap. #55662

GhostScreaming · 2023-07-24T11:29:15Z

PR types

Others

PR changes

Others

Description

PCard-70444.
Add mp_all_reduce asynchronize overlap.
Use environment flags to control following behaviors:

export Flags_mp_aysnc_allreduce=True to turn on mp async all_reduce
export Flags_skip_mp_c_identity=True to skip two c_identity operators in dygraph mode.
export Flags_fused_linear_param_grad_add=True to enable fused_linear_param_grad_add in ColumnParallelLinear backward with mp async all_reduce.

paddle-ci-bot · 2023-08-02T03:06:51Z

Sorry to inform you that 0524262's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… mp_async_all_reduce

control following behaviors: 1. export Flags_mp_aysnc_allreduce=True to turn on mp async all_reduce 2. export Flags_skip_mp_c_identity=True to skip two c_identity operators in dygraph mode. 3. export Flags_fused_linear_param_grad_add to enable fused_linear_param_grad_add in ColumnParallel backward with mp async all_reduce.

python/paddle/distributed/fleet/layers/mpu/mp_layers.py

Xreki · 2023-08-11T07:27:17Z

python/paddle/distributed/fleet/layers/mpu/mp_layers.py

+                                task.wait()
+                                return dx, dw, dbias
+                    else:
+                        dw = paddle.matmul(


这个分支引入了一些reshape，会不会导致一些模型变慢？

这里的reshape只是改变了数据的逻辑shape，没有进行数据搬移。实测对性能是没啥影响的，timeline的kernel执行时间也是一致的。

python/paddle/distributed/fleet/layers/mpu/mp_layers.py

FeixLiu

LGTM

paddle/fluid/pybind/distributed_py.cc

python/paddle/distributed/fleet/layers/mpu/mp_ops.py

FeixLiu

LGTM

… case.

FeixLiu

LGTM with TODO: remove three flags, move to distributed_strategy.proto MpConfig

LiYuRio

LGTM for communication

GhostScreaming added 2 commits July 24, 2023 19:23

[WIP] Add mp_all_reduce asynchronize overlap.

17f322f

Fix some problems.

0524262

GhostScreaming added 3 commits August 7, 2023 19:39

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

899423b

… mp_async_all_reduce

Fix dw compute bug, and use a temporary solution to achieve overlap.

cc4bced

Use fused_linear_param_grad_add to compute dw.

6120c99

FeixLiu changed the title ~~[WIP] Add mp_all_reduce asynchronize overlap.~~ Add mp_all_reduce asynchronize overlap. Aug 11, 2023

FeixLiu reviewed Aug 11, 2023

View reviewed changes

python/paddle/distributed/fleet/layers/mpu/mp_layers.py Outdated Show resolved Hide resolved

python/paddle/distributed/fleet/layers/mpu/mp_layers.py Outdated Show resolved Hide resolved

python/paddle/distributed/fleet/layers/mpu/mp_layers.py Show resolved Hide resolved

Xreki reviewed Aug 11, 2023

View reviewed changes

Polish code.

545da86

FeixLiu previously approved these changes Aug 11, 2023

View reviewed changes

paddle/fluid/pybind/distributed_py.cc Outdated Show resolved Hide resolved

Remove useless communication API.

cba7b23

GhostScreaming dismissed FeixLiu’s stale review via cba7b23 August 11, 2023 09:40

Fix some problems in mp_async_all_reduce and skip_c_identity.

3114673

FeixLiu reviewed Aug 14, 2023

View reviewed changes

python/paddle/distributed/fleet/layers/mpu/mp_ops.py Outdated Show resolved Hide resolved

python/paddle/distributed/fleet/layers/mpu/mp_ops.py Outdated Show resolved Hide resolved

python/paddle/distributed/fleet/layers/mpu/mp_ops.py Show resolved Hide resolved

Fix some problems.

2c08b3d

FeixLiu previously approved these changes Aug 14, 2023

View reviewed changes

Add test cases.

8666133

GhostScreaming dismissed FeixLiu’s stale review via 8666133 August 14, 2023 13:54

GhostScreaming added 2 commits August 15, 2023 10:47

Remove environment variable Flags_fused_linear_param_grad_add in test…

cb66c21

… case.

Reset error threshold.

97b6180

FeixLiu previously approved these changes Aug 15, 2023

View reviewed changes

Reset threshold in test case.

e84c04f

GhostScreaming dismissed FeixLiu’s stale review via e84c04f August 15, 2023 11:32

Add useful log. Remove useless test cases.

bf7fe65

tianshuo78520a approved these changes Aug 16, 2023

View reviewed changes

FeixLiu approved these changes Aug 16, 2023

View reviewed changes

LiYuRio approved these changes Aug 16, 2023

View reviewed changes

Xreki merged commit 6b1dfb5 into PaddlePaddle:develop Aug 16, 2023

From00 mentioned this pull request Sep 18, 2023

MP overlap for 1f1b #57446

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mp_all_reduce asynchronize overlap. #55662

Add mp_all_reduce asynchronize overlap. #55662

GhostScreaming commented Jul 24, 2023 •

edited

Loading

paddle-ci-bot bot commented Aug 2, 2023

Xreki Aug 11, 2023

GhostScreaming Aug 11, 2023

FeixLiu left a comment

FeixLiu left a comment

FeixLiu left a comment

LiYuRio left a comment

Add mp_all_reduce asynchronize overlap. #55662

Add mp_all_reduce asynchronize overlap. #55662

Conversation

GhostScreaming commented Jul 24, 2023 • edited Loading

PR types

PR changes

Description

paddle-ci-bot bot commented Aug 2, 2023

Xreki Aug 11, 2023

Choose a reason for hiding this comment

GhostScreaming Aug 11, 2023

Choose a reason for hiding this comment

FeixLiu left a comment

Choose a reason for hiding this comment

FeixLiu left a comment

Choose a reason for hiding this comment

FeixLiu left a comment

Choose a reason for hiding this comment

LiYuRio left a comment

Choose a reason for hiding this comment

GhostScreaming commented Jul 24, 2023 •

edited

Loading