MP overlap for 1f1b #57446

From00 · 2023-09-18T06:45:15Z

PR types

Performance optimization

PR changes

Others

Description

PCard-71568

本PR在静半中实现以下两个优化点，以期通过多流隐藏MP通信提升大模型端到端性能。

【1F1B场景下反向阶段与另一个micro-bach前向的overlap】
当前状态收益提升不够明显，只能隐藏反向不到1/3的allreduce 通信，小规模下测试端到端收益只有1%左右。依赖于以下问题的解决以挖掘更多优化空间：

大块的功能支持
（1）开发一套准确率更高的cost_model，精确评估每个算子的耗时，以辅助细粒度的算子多流编排
（2）开发基于cost_model的自适应算子拆分机制，将前向计算中耗时远大于反向通信的算子拆分成多个粒度更细的算子，避免提前拉大算子延长反向计算时间，导致流水bubble增加
（3）流优先级分配方案的设计和实现，减少device端多流调度和同步开销
小点的适配优化
（1）实现PP recv通信多流，避免前向recv在计算流上阻塞计算算子
（2）消除冗余c_identity，避免出现耗时超过MP通信的c_identity拷贝影响编排

【反向阶段MP通信与matmul_grad计算overlap】
相比反向与前向的细粒度编排和overlap，MP通信与matmul_grad计算的overlap只是一个顺带的小优化点，对齐动态图实现：#55662
当前已通过column_parallel_linear_backward_overlapping支持，GPT-3 6.7B MP2-PP4下有约1%收益，更大规模任务下的收益待后续测试。

这两项优化当前都不是最终实现状态，因关联代码改动较多，为避免与静半其它优化工作相互依赖和冲突，本PR先做合入之后再做进一步的迭代和调优。

… backward-forward-overlap-for-1f1b

paddle-bot · 2023-09-18T06:45:19Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

JZ-LIANG

LGTM overall

JZ-LIANG · 2023-09-19T08:15:04Z

python/paddle/distributed/auto_parallel/static/parallelizer_v2.py

@@ -354,6 +355,17 @@ def _apply_post_optimization(
            )
            params_grads = self._pass_context.get_attr("params_grads")

+        mp_async_allreduce_in_backward = os.getenv(
+            "FLAGS_mp_async_allreduce_in_backward"


could use config like:
config["use_sharding"] = self._strategy.sharding.enable
for switch

JZ-LIANG · 2023-09-19T08:17:27Z

python/paddle/distributed/passes/pass_utils.py

@@ -34,6 +36,14 @@
 ]


+# NOTE: Here stream is just a presentation with different name,
+# it is up to executor to create the exact streams given the name.
+class AutoParallelStreamType(Enum):


data parallel allreduce stream (in data_parallel_pass) maybe out of control by this TYPE.

heavyrain-lzy

LGTM for pass_utils.py

Caozhou1995

LGTM for cost model and cluster

zhaoyinglia · 2023-09-19T11:08:15Z

python/paddle/distributed/passes/pipeline_scheduler_pass.py

-            forward_job = core.Job("forward")
-            forward_job.set_micro_batch_id(forward_micro_batch_id)
-            job_list.append(forward_job)
+            for job_type in self.jobs_in_stable_phase:


1F1B-Overlap-Pass and 1F1B-Pass can be decoupled, cause their schedules are different.

zhaoyinglia

LGTM for pipeline_scheduler_pass

zhiqiu

LGTM overrall

* B-F overlap * Add column_parallel_linear_backward_overlapping * Add cost model * Insert reshape for ColumnParallelLinearBackwardOverlappingPass * Add cross-program event dependency * Refine split program in _backward_forward_overlap * Add empirical op cost * Add NOTE * Remove some redundant codes * Remove some redundant codes * Fix UTs

From00 added 11 commits August 28, 2023 08:53

B-F overlap

dedd69b

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3c252c1

… backward-forward-overlap-for-1f1b

Add column_parallel_linear_backward_overlapping

ac49ea7

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3389eea

… backward-forward-overlap-for-1f1b

Add cost model

ee4cc58

Insert reshape for ColumnParallelLinearBackwardOverlappingPass

57e09f8

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

468fa11

… backward-forward-overlap-for-1f1b

Add cross-program event dependency

ab00cd7

Refine split program in _backward_forward_overlap

8d97c8f

Add empirical op cost

91d5e42

Add NOTE

5e369a1

From00 added 3 commits September 18, 2023 14:54

Remove some redundant codes

50e60c9

Remove some redundant codes

4829542

Fix UTs

0d36e20

JZ-LIANG approved these changes Sep 19, 2023

View reviewed changes

heavyrain-lzy approved these changes Sep 19, 2023

View reviewed changes

Caozhou1995 reviewed Sep 19, 2023

View reviewed changes

Caozhou1995 approved these changes Sep 19, 2023

View reviewed changes

zhaoyinglia reviewed Sep 19, 2023

View reviewed changes

zhaoyinglia approved these changes Sep 19, 2023

View reviewed changes

zhiqiu approved these changes Sep 19, 2023

View reviewed changes

From00 merged commit 7264bb7 into PaddlePaddle:develop Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MP overlap for 1f1b #57446

MP overlap for 1f1b #57446

From00 commented Sep 18, 2023 •

edited

Loading

paddle-bot bot commented Sep 18, 2023

JZ-LIANG left a comment

JZ-LIANG Sep 19, 2023

JZ-LIANG Sep 19, 2023

heavyrain-lzy left a comment

Caozhou1995 left a comment

zhaoyinglia Sep 19, 2023

zhaoyinglia left a comment

zhiqiu left a comment

MP overlap for 1f1b #57446

MP overlap for 1f1b #57446

Conversation

From00 commented Sep 18, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Sep 18, 2023

JZ-LIANG left a comment

Choose a reason for hiding this comment

JZ-LIANG Sep 19, 2023

Choose a reason for hiding this comment

JZ-LIANG Sep 19, 2023

Choose a reason for hiding this comment

heavyrain-lzy left a comment

Choose a reason for hiding this comment

Caozhou1995 left a comment

Choose a reason for hiding this comment

zhaoyinglia Sep 19, 2023

Choose a reason for hiding this comment

zhaoyinglia left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

From00 commented Sep 18, 2023 •

edited

Loading