Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mp_all_reduce asynchronize overlap. #55662

Merged
merged 15 commits into from
Aug 16, 2023

Conversation

GhostScreaming
Copy link
Contributor

@GhostScreaming GhostScreaming commented Jul 24, 2023

PR types

Others

PR changes

Others

Description

PCard-70444.
Add mp_all_reduce asynchronize overlap.
Use environment flags to control following behaviors:

  1. export Flags_mp_aysnc_allreduce=True to turn on mp async all_reduce
  2. export Flags_skip_mp_c_identity=True to skip two c_identity operators in dygraph mode.
  3. export Flags_fused_linear_param_grad_add=True to enable fused_linear_param_grad_add in ColumnParallelLinear backward with mp async all_reduce.

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Aug 2, 2023

Sorry to inform you that 0524262's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@FeixLiu FeixLiu changed the title [WIP] Add mp_all_reduce asynchronize overlap. Add mp_all_reduce asynchronize overlap. Aug 11, 2023
control following behaviors:
1. export Flags_mp_aysnc_allreduce=True to turn on mp async all_reduce
2. export Flags_skip_mp_c_identity=True to skip two c_identity operators
   in dygraph mode.
3. export Flags_fused_linear_param_grad_add to enable fused_linear_param_grad_add
   in ColumnParallel backward with mp async all_reduce.
task.wait()
return dx, dw, dbias
else:
dw = paddle.matmul(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个分支引入了一些reshape,会不会导致一些模型变慢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的reshape只是改变了数据的逻辑shape,没有进行数据搬移。实测对性能是没啥影响的,timeline的kernel执行时间也是一致的。

python/paddle/distributed/fleet/layers/mpu/mp_layers.py Outdated Show resolved Hide resolved
python/paddle/distributed/fleet/layers/mpu/mp_layers.py Outdated Show resolved Hide resolved
FeixLiu
FeixLiu previously approved these changes Aug 11, 2023
Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

paddle/fluid/pybind/distributed_py.cc Outdated Show resolved Hide resolved
FeixLiu
FeixLiu previously approved these changes Aug 14, 2023
Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

FeixLiu
FeixLiu previously approved these changes Aug 15, 2023
Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with TODO: remove three flags, move to distributed_strategy.proto MpConfig

Copy link
Contributor

@LiYuRio LiYuRio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for communication

@Xreki Xreki merged commit 6b1dfb5 into PaddlePaddle:develop Aug 16, 2023
@From00 From00 mentioned this pull request Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants