-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mp_all_reduce asynchronize overlap. #55662
Add mp_all_reduce asynchronize overlap. #55662
Conversation
Sorry to inform you that 0524262's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
control following behaviors: 1. export Flags_mp_aysnc_allreduce=True to turn on mp async all_reduce 2. export Flags_skip_mp_c_identity=True to skip two c_identity operators in dygraph mode. 3. export Flags_fused_linear_param_grad_add to enable fused_linear_param_grad_add in ColumnParallel backward with mp async all_reduce.
task.wait() | ||
return dx, dw, dbias | ||
else: | ||
dw = paddle.matmul( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个分支引入了一些reshape,会不会导致一些模型变慢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的reshape只是改变了数据的逻辑shape,没有进行数据搬移。实测对性能是没啥影响的,timeline的kernel执行时间也是一致的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with TODO: remove three flags, move to distributed_strategy.proto MpConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for communication
PR types
Others
PR changes
Others
Description
PCard-70444.
Add mp_all_reduce asynchronize overlap.
Use environment flags to control following behaviors:
export Flags_mp_aysnc_allreduce=True
to turn on mp async all_reduceexport Flags_skip_mp_c_identity=True
to skip two c_identity operators in dygraph mode.export Flags_fused_linear_param_grad_add=True
to enable fused_linear_param_grad_add in ColumnParallelLinear backward with mp async all_reduce.