[Docs] AndSonder Add 可视化静态图自动并行时序图使用文档 #53

AndSonder · 2023-11-06T12:42:12Z

No description provided.

…ade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

From00

这里有之前paddle profiler的一个用户文档，可以参考https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/performance_improving/profiling_model.html

From00 · 2023-12-07T07:47:34Z

...nFrameworkUpgrade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

@@ -0,0 +1,119 @@
+# 可视化静态图自动并行时序图工具使用手册
+
+由于当下大模型的训练时间较长，分布式训练时序图的可视化对于调试和分析模型的训练过程非常重要。当下没有工具能够直接给出各个GPU设备上不同Job的运行区间，本可视化工具用来实现这个功能。


背景描述需要改下：

这个工具展现的是流水并行的时序图，而不是分布式训练的时序图。

由于当下大模型的训练时间较长，分布式训练时序图的可视化对于调试和分析模型的训练过程非常重要。这句话因果逻辑并不明显，在小模型时代，profiler也一样是非常重要的性能分析手段。

当下没有工具能够直接给出各个GPU设备上不同Job的运行区间，这个说法也是很容易让人产生困惑的。大多数人习惯使用nsight做GPU性能profiler，看到这里的第一反应会是nsight就是这种工具，所以这里建议描述清楚为什么不使用nsight而需要新开发一套工具，新开发的工具相比nsight有哪些优势？

From00 · 2023-12-07T07:50:14Z

...nFrameworkUpgrade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

+
+## 1. 生成时序图数据
+
+本工具将可视化功能集成到了命令行参数中，以下以 PaddleNLP 中 LLama 训练脚本为例：


这个工具是一个通用的工具，不是只给LLama用的，文档面向的也是想用这个工具分析自己性能的用户，而不是在使用PaddleNLP跑LLama模型的用户，所以应该着重介绍通过Paddle框架如何调用这个工具，而不是在PaddleNLP中如何调用。
当然，在介绍如何基于Paddle调用的基础上，可以再简单介绍下PaddleNLP的LLama模型通过命令行的方式封装了更易用的接口。

From00 · 2023-12-07T08:27:32Z

...nFrameworkUpgrade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

+然后在任意一台机器上运行可视化脚本并指定 `--log_dir` 参数为 `log_dir` 目录以及开启 `--multi_machine` 参数即可。
+
+```bash
+python python/paddle/distributed/auto_parallel/static/profiler_helper_static.py --devices 0,1 --log_dir /home/workspace/PaddleNLP/llm/llama/output/llama_7b_pp2_mp4_st_log/multi_machine_logs --multi_machine


这个工具是面向流水并行场景，主要关注流水并行子图的实际编排和调度状况，在对子图进行分析后，下一步更细粒度的问题分析仍然需要通过nsight进行。在文档建议说明下使用场景和一般流程。

From00 · 2023-12-11T06:17:19Z

...nFrameworkUpgrade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

@@ -0,0 +1,164 @@
+# 可视化静态图流水并行时序图工具使用手册
+
+飞桨框架提供了流水并行时序图可视化工具，可以对模型运行过程中流水并行信息进行收集、统计和展示。


Suggested change

飞桨框架提供了流水并行时序图可视化工具，可以对模型运行过程中流水并行信息进行收集、统计和展示。

飞桨框架提供了流水并行时序图可视化工具，可以对模型运行过程中流水并行子图的调度信息进行收集、统计和展示。

From00 · 2023-12-11T06:31:26Z

...nFrameworkUpgrade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

+
+飞桨框架提供了流水并行时序图可视化工具，可以对模型运行过程中流水并行信息进行收集、统计和展示。
+
+虽然当下已经存在GPU性能profiler工具如nsight，但当前尚未有一款工具直接提供了在不同GPU设备上执行的各个任务（Job）的精准运行时间区间。nsight等工具是在cpu上打断点，但是由于GPU程序的异步性，cpu端的断点往往无法精准的统计GPU任务的运行区间。


虽然在GPU设备上可以使用Nsight工具对模型进行性能分析和优化，但在分布式流水并行场景下Nsight工具存在以下不足：

由于GPU程序的异步运行特性，在CPU端针对流水并行子图添加的NVTX标记无法直接与GPU端实际的算子执行区间相对应。在分析流水并行任务时，往往需要从一长串的kernel执行中人工找出每个流水子图的起始和终止算子，才能还原模型实际的流水并行情况。

流水并行任务往往跨多个机器执行，完整的模型被切分成多个子图，分配给多台机器运行。nsight工具只能单独对每台机器上的调度信息进行采集和展现，无法集中呈现多机之间流水并行的调度全貌。

From00 · 2023-12-11T06:33:59Z

...nFrameworkUpgrade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

+PaddleNLP下，本工具将可视化功能集成到了命令行参数中，以下以 PaddleNLP 中 LLama 训练脚本为例：
+
+```bash
+task_name="llama_7b_pp2_mp4_st"


这个脚本启动命令是内部实验使用的，对外建议直接基于PaddleNLP仓库里向用户推荐的运行方式去演示如何开启profiler。

PaddleNLP llama 的 readme 里面目前只有推荐的动态图启动命令，我仿造readme补充了一些使用前的准备

From00 · 2023-12-15T05:58:01Z

...nFrameworkUpgrade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

+python -u  -m paddle.distributed.launch \
+     --gpus "0,1,2,3" \
+     --log_dir "output/$task_name""_log" \
+     run_pretrain_auto.py \


现在PaddleNLP上这个脚本位置更新了，需要改成auto_parallel/run_pretrain_auto.py

From00

LGTM

add Docs/10_StaticGraph_Semi-AutomaticParallel_ExecutionFrameworkUpgr…

477eacb

…ade/UsageDocs/1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

AndSonder assigned AndSonder and From00 and unassigned AndSonder Nov 6, 2023

AndSonder mentioned this pull request Nov 7, 2023

[WeeklyReports] 2023.10.25~2023.11.07 周报汇总 #54

Closed

22 tasks

AndSonder added 2 commits December 6, 2023 10:20

Merge branch 'main' of https://github.com/PFCCLab/Camp into doc2

1c6fb11

update

31b9e7d

AndSonder mentioned this pull request Dec 6, 2023

[WeeklyReports] 2023.11.22~2023.12.05 周报汇总 #102

Closed

20 tasks

From00 reviewed Dec 7, 2023

View reviewed changes

apply suggestions from review

046f393

AndSonder requested a review from From00 December 8, 2023 07:19

From00 reviewed Dec 11, 2023

View reviewed changes

apply suggestions from code review

a6c017d

AndSonder requested a review from From00 December 11, 2023 07:01

From00 reviewed Dec 15, 2023

View reviewed changes

Update 1-visualize-flow-parallel-timing-diagram-in-static-graph-mode.md

57881ce

AndSonder requested a review from From00 December 15, 2023 08:18

From00 approved these changes Dec 18, 2023

View reviewed changes

From00 merged commit 71fca6a into PFCCLab:main Dec 18, 2023
1 check passed

AndSonder mentioned this pull request Dec 19, 2023

[Auto Parallel] Add doc for visualize flow parallel timing diagram tool in static graph mode PaddlePaddle/docs#6404

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] AndSonder Add 可视化静态图自动并行时序图使用文档 #53

[Docs] AndSonder Add 可视化静态图自动并行时序图使用文档 #53

AndSonder commented Nov 6, 2023

From00 left a comment

From00 Dec 7, 2023

From00 Dec 7, 2023

From00 Dec 7, 2023

From00 Dec 11, 2023

AndSonder Dec 11, 2023

From00 Dec 11, 2023

AndSonder Dec 11, 2023

From00 Dec 11, 2023

AndSonder Dec 11, 2023

From00 Dec 15, 2023

AndSonder Dec 15, 2023

From00 left a comment

		@@ -0,0 +1,119 @@
		# 可视化静态图自动并行时序图工具使用手册

		由于当下大模型的训练时间较长，分布式训练时序图的可视化对于调试和分析模型的训练过程非常重要。当下没有工具能够直接给出各个GPU设备上不同Job的运行区间，本可视化工具用来实现这个功能。


		## 1. 生成时序图数据

		本工具将可视化功能集成到了命令行参数中，以下以 PaddleNLP 中 LLama 训练脚本为例：

		@@ -0,0 +1,164 @@
		# 可视化静态图流水并行时序图工具使用手册

		飞桨框架提供了流水并行时序图可视化工具，可以对模型运行过程中流水并行信息进行收集、统计和展示。

	飞桨框架提供了流水并行时序图可视化工具，可以对模型运行过程中流水并行信息进行收集、统计和展示。
	飞桨框架提供了流水并行时序图可视化工具，可以对模型运行过程中流水并行子图的调度信息进行收集、统计和展示。


		飞桨框架提供了流水并行时序图可视化工具，可以对模型运行过程中流水并行信息进行收集、统计和展示。

		虽然当下已经存在GPU性能profiler工具如nsight，但当前尚未有一款工具直接提供了在不同GPU设备上执行的各个任务（Job）的精准运行时间区间。nsight等工具是在cpu上打断点，但是由于GPU程序的异步性，cpu端的断点往往无法精准的统计GPU任务的运行区间。

[Docs] AndSonder Add 可视化静态图自动并行时序图使用文档 #53

[Docs] AndSonder Add 可视化静态图自动并行时序图使用文档 #53

Conversation

AndSonder commented Nov 6, 2023

From00 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

From00 left a comment

Choose a reason for hiding this comment