Encode communicator groups in Chakra traces #140

JoongunPark · 2024-07-24T18:15:02Z

Summary

Encoding communicator groups in Chakra traces is essential for accurately simulating collective communication when multiple communicator groups are present. With the latest PyTorch version, you can collect communicator groups in Chakra host traces (PyTorch execution traces) and Chakra device traces (Kineto traces). In Chakra host traces, you will find a process_group:init operator that presents the available communicator groups in the run. Moreover, whenever there is a collective communication operator, you can find essential fields in its attributes to correlate the collective operator with a communicator group. You can use the pg_name field for correlation. Additionally, Chakra device traces now include communicator group information in ncclDevKernel_* operators.

Below is an example with AllReduce.

{
  "ph": "X",
  "cat": "kernel",
  "name": "ncclDevKernel_AllReduce_Sum_bf16_RING_LL(ncclDevKernelArgsStorage<4096ul>)",
  "pid": 0,
  "tid": 60,
  "args": {
    "External id": 14728,
    "queued": 0,
    "device": 0,
    "context": 1,
    "stream": 60,
    "correlation": 136816,
    "registers per thread": 96,
    "shared memory": 89296,
    "blocks per SM": 0.222222,
    "warps per SM": 3.777778,
    "grid": [24, 1, 1],
    "block": [544, 1, 1],
    "est. achieved occupancy %": 0,
    "Collective name": "allreduce",
    "In msg nelems": 6291456,
    "Out msg nelems": 6291456,
    "Group size": 2,
    "dtype": "BFloat16",
    "In split size": "[]",
    "Out split size": "[]",
    "Process Group Name": "27",
    "Process Group Description": "undefined",
    "Process Group Ranks": "[0, 1]"
  }
}

It includes "Group size," "Process Group Name," "Process Group Description," and "Process Group Ranks."

Most of the information, except for Process Group Name, is redundant since it is already defined in the metadata, as shown in the example below.

"distributedInfo": {
  "backend": "nccl",
  "rank": 0,
  "world_size": 8,
  "pg_count": 67,
  "pg_config": [
    {"pg_name": "0", "pg_desc": "default_pg", "backend_config": "cuda:nccl", "pg_size": 8, "ranks": [0, 1, 2, 3, 4, 5, 6, 7]},
    {"pg_name": "1", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 2, "ranks": [0, 2]},
    {"pg_name": "2", "pg_desc": "undefined", "backend_config": "cpu:gloo,cuda:gloo", "pg_size": 2, "ranks": [0, 2]},
    {"pg_name": "9", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 2, "ranks": [0, 2]},
    {"pg_name": "10", "pg_desc": "undefined", "backend_config": "cpu:gloo,cuda:gloo", "pg_size": 2, "ranks": [0, 2]},
    {"pg_name": "17", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 1, "ranks": [0]},
    {"pg_name": "25", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 4, "ranks": [0, 1, 4, 5]},
    {"pg_name": "27", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 2, "ranks": [0, 1]},
    {"pg_name": "31", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 2, "ranks": [0, 4]},
    {"pg_name": "32", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 2, "ranks": [0, 4]},
    {"pg_name": "33", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 1, "ranks": [0]},
    {"pg_name": "43", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 4, "ranks": [0, 1, 2, 3]},
    {"pg_name": "45", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 4, "ranks": [0, 1, 2, 3]},
    {"pg_name": "47", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 2, "ranks": [0, 1]},
    {"pg_name": "51", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 1, "ranks": [0]},
    {"pg_name": "59", "pg_desc": "undefined", "backend_config": "cuda:nccl", "pg_size": 2, "ranks": [0, 2]},
    {"pg_name": "60", "pg_desc": "undefined", "backend_config": "cpu:gloo,cuda:gloo", "pg_size": 2, "ranks": [0, 2]}
  ],
  "nccl_version": "2.22.3"
}

This PR allows users to identify the pg_init operator by classifying the node explicitly as a metadata node. Moreover, this PR explicitly encodes pg_name as an attribute of collective communication operators. Finally, this PR updates the feeder so that simulators can parse and access the pg_name field easily.

Test Plan

Generate Chakra HDT traces.

for rank in 0 1 2 3 4 5 6 7; do
    chakra_trace_link --chakra-host-trace gpt3_126m_1.1.0-chakra.0.0.4/et_${rank}.json --chakra-device-trace gpt3_126m_1.1.0-chakra.0.0.4/kineto_${rank}.json --output-file gpt3_126m_1.1.0-chakra.0.0.4/rank_${rank}.json   
    chakra_converter PyTorch --input gpt3_126m_1.1.0-chakra.0.0.4/rank_${rank}.json --output gpt3_126m_1.1.0-chakra.0.0.4/rank.${rank}.et
done

Check through Jsonizer

for rank in 0 1 2 3 4 5 6 7; do
    chakra_jsonizer --input_filename gpt3_126m_1.1.0-chakra.0.0.4/rank_${rank}.json --output_filename gpt3_126m_1.1.0-chakra.0.0.4/rank.${rank}.json
done

Test ETFeeder with ASTRA-Sim

mv ../gpt3_126m_1.1.0-chakra.0.0.4/ ./gpt3 
./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware --workload-configuration=/home/un-gpu/Project/jpark/astra-sim/gpt3/rank --system-configuration=./inputs/system/Ring.json --network-configuratio
n=./inputs/network/analytical/Ring.yml --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json

Code in the ASTRA-Sim using ETFeeder

  if (!node->is_cpu_op() && (node->type() == ChakraNodeType::COMM_COLL_NODE)) {
    if(node->pg_name().empty() == false){
      cout << "Node Name: " << node->name() << endl;
      cout << "Process Group Name" << node->pg_name() << endl;
    }
  }

Trace

The traces are collected From PyTorch Schema 1.1.0.chakra-0.0.4
gpt3_126m_1.1.0-chakra.0.0.4.zip

github-actions · 2024-07-24T18:15:16Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

TaekyungHeo · 2024-07-26T17:45:07Z

Thank you for your contribution, @JoongunPark.

I have updated the PR summary. Please review it and update it if needed.
If you recall the PyTorch version that you used, please add it to the PR summary. Some users want to know the exact version number.
I have merged Identify process group init nodes as METADATA nodes #109 into this PR. Please check if the updated PR works for ASTRA-sim.
Please rebase your PR to the latest main branch.

src/trace_link/kineto_operator.py

src/converter/pytorch_converter.py

src/trace_link/trace_linker.py

JoongunPark · 2024-07-26T20:00:25Z

Thank you for your review! @TaekyungHeo.
All of your action item has been processed.
Please check if the changes meet the requirements!

Below is the result from trace_linker and converter.

[2024-07-26 15:51:34,182] trace_link.py:48 [INFO]: Linking process successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank_0.json.
[2024-07-26 15:51:34,182] trace_link.py:49 [INFO]: Please run the chakra_converter for further postprocessing.
INFO [07/26/2024 03:51:40 PM] Conversion successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank.0.et.
[2024-07-26 15:51:50,943] trace_link.py:48 [INFO]: Linking process successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank_1.json.
[2024-07-26 15:51:50,943] trace_link.py:49 [INFO]: Please run the chakra_converter for further postprocessing.
INFO [07/26/2024 03:51:56 PM] Conversion successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank.1.et.
[2024-07-26 15:52:07,224] trace_link.py:48 [INFO]: Linking process successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank_2.json.
[2024-07-26 15:52:07,224] trace_link.py:49 [INFO]: Please run the chakra_converter for further postprocessing.
INFO [07/26/2024 03:52:12 PM] Conversion successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank.2.et.
[2024-07-26 15:52:22,995] trace_link.py:48 [INFO]: Linking process successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank_3.json.
[2024-07-26 15:52:22,995] trace_link.py:49 [INFO]: Please run the chakra_converter for further postprocessing.
INFO [07/26/2024 03:52:28 PM] Conversion successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank.3.et.
[2024-07-26 15:52:39,483] trace_link.py:48 [INFO]: Linking process successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank_4.json.
[2024-07-26 15:52:39,483] trace_link.py:49 [INFO]: Please run the chakra_converter for further postprocessing.
INFO [07/26/2024 03:52:45 PM] Conversion successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank.4.et.
[2024-07-26 15:52:56,729] trace_link.py:48 [INFO]: Linking process successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank_5.json.
[2024-07-26 15:52:56,729] trace_link.py:49 [INFO]: Please run the chakra_converter for further postprocessing.
INFO [07/26/2024 03:53:02 PM] Conversion successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank.5.et.
[2024-07-26 15:53:13,582] trace_link.py:48 [INFO]: Linking process successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank_6.json.
[2024-07-26 15:53:13,582] trace_link.py:49 [INFO]: Please run the chakra_converter for further postprocessing.
INFO [07/26/2024 03:53:19 PM] Conversion successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank.6.et.
[2024-07-26 15:53:30,121] trace_link.py:48 [INFO]: Linking process successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank_7.json.
[2024-07-26 15:53:30,121] trace_link.py:49 [INFO]: Please run the chakra_converter for further postprocessing.
INFO [07/26/2024 03:53:35 PM] Conversion successful. Output file is available at gpt3_126m_1.1.0-chakra.0.0.4/rank.7.et.

Also I checked ASTRA-Sim can print out pg_name with above code.

Node Name:  ncclDevKernel_SendRecv(ncclDevKernelArgsStorage<4096ul>)
Process Group Name: 31
Node Name:  ncclDevKernel_SendRecv(ncclDevKernelArgsStorage<4096ul>)
Process Group Name: 34
Node Name:  ncclDevKernel_SendRecv(ncclDevKernelArgsStorage<4096ul>)
Process Group Name: 40
Node Name:  ncclDevKernel_SendRecv(ncclDevKernelArgsStorage<4096ul>)
Process Group Name: 37
Node Name:  ncclDevKernel_AllReduce_Sum_bf16_RING_LL(ncclDevKernelArgsStorage<4096ul>)
Process Group Name: 28

...

JoongunPark · 2024-08-27T17:19:11Z

Hello!
I wanted to check if there are any further discussions or concerns regarding this PR. This update is crucial for ensuring proper communication simulation support in ASTRA-Sim. I hope we can proceed with merging it soon.

JoongunPark requested a review from a team as a code owner July 24, 2024 18:15

JoongunPark force-pushed the main branch 2 times, most recently from 0808089 to 243062f Compare July 24, 2024 18:42

TaekyungHeo added the enhancement New feature or request label Jul 24, 2024

TaekyungHeo changed the title ~~Encoding Process Group information in Chakra traces~~ Encode communicator groups in Chakra traces Jul 26, 2024

TaekyungHeo reviewed Jul 26, 2024

View reviewed changes

src/trace_link/kineto_operator.py Outdated Show resolved Hide resolved

TaekyungHeo reviewed Jul 26, 2024

View reviewed changes

src/trace_link/kineto_operator.py Show resolved Hide resolved

TaekyungHeo reviewed Jul 26, 2024

View reviewed changes

src/converter/pytorch_converter.py Outdated Show resolved Hide resolved

TaekyungHeo reviewed Jul 26, 2024

View reviewed changes

src/trace_link/trace_linker.py Outdated Show resolved Hide resolved

JoongunPark force-pushed the main branch 3 times, most recently from 9114ae0 to 0570ecc Compare July 26, 2024 19:35

TaekyungHeo and others added 2 commits July 26, 2024 15:50

Identify process group init nodes as METADATA nodes

35d004e

Update trace_linker to encode pg_name for collectives

5107e96

JoongunPark added 3 commits July 29, 2024 13:02

Update converter to encode pg_name in attr field

dd62ef3

Update ETFeeder to read pg_name and provider getter

b5dcd00

Add 'pg_name' attribute to Mock objects in test

52de625

JoongunPark force-pushed the main branch from 0570ecc to 52de625 Compare July 29, 2024 17:02

TaekyungHeo added feature and removed feature labels Sep 6, 2024

srinivas212 approved these changes Sep 6, 2024

View reviewed changes

srinivas212 merged commit 73edb74 into mlcommons:main Sep 6, 2024
9 checks passed

github-actions bot locked and limited conversation to collaborators Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode communicator groups in Chakra traces #140

Encode communicator groups in Chakra traces #140

JoongunPark commented Jul 24, 2024 •

edited

Loading

github-actions bot commented Jul 24, 2024 •

edited

Loading

TaekyungHeo commented Jul 26, 2024 •

edited

Loading

JoongunPark commented Jul 26, 2024

JoongunPark commented Aug 27, 2024 •

edited

Loading

Encode communicator groups in Chakra traces #140

Encode communicator groups in Chakra traces #140

Conversation

JoongunPark commented Jul 24, 2024 • edited Loading

Summary

Test Plan

Trace

github-actions bot commented Jul 24, 2024 • edited Loading

TaekyungHeo commented Jul 26, 2024 • edited Loading

JoongunPark commented Jul 26, 2024

JoongunPark commented Aug 27, 2024 • edited Loading

JoongunPark commented Jul 24, 2024 •

edited

Loading

github-actions bot commented Jul 24, 2024 •

edited

Loading

TaekyungHeo commented Jul 26, 2024 •

edited

Loading

JoongunPark commented Aug 27, 2024 •

edited

Loading