Refactoring PyTorch2Chakra converter for better readability and dependency handling #19

TaekyungHeo · 2024-02-06T14:48:07Z

Summary

This pull request significantly refactors the et_converter module, with a particular focus on enhancing the PyTorch2Chakra converter:

Shifted towards an object-oriented approach for improved readability and maintainability.
Revised the logic for data dependency identification:
- Moved away from using tensor input-output relationships.
- Now follows control dependencies to accurately encode data dependencies.
Introduced utility functions for:
- Cycle detection to validate dependencies.
- Execution simulation for easier simulation of final Chakra traces.
Enhanced documentation through detailed logging and comments to explain the conversion process from traces to final Chakra traces, addressing previous bugs and limitations.

Test Plan

$ cd ~/param
$ cd param/train/comms/pt
$ pip install .
$ cd ../../compute/python
$ pip install -r requirements.txt
$ python setup.py install
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_0.json --kineto-file ~/llama_kineto/worker0_step_12.1697596714999.pt.trace.json --output-file ~/rank0.json

$ cd ~/charka
$ pip install .
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank0.json --output_filename ~/rank0.chakra --num_dims 1
$ tail debug.log
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107127 completed at 7276140us
INFO [02/06/2024 09:50:41 AM] Issuing GPU Node ID 107132 (void at::native::vectorized_elementwise_kernel<4, at::native::sqrt_kernel_cuda(at::TensorIteratorBase&)::{lambda()#2}::operator()() const::{lambda()#4}::operator()() const::{lambda(c10::BFloat16)#1}, at::detail::Array<char*, 2> >(int, at::native::sqrt_kernel_cuda(at::TensorIteratorBase&)::{lambda()#2}::operator()() const::{lambda()#4}::operator()() const::{lambda(c10::BFloat16)#1}, at::detail::Array<char*, 2>)) at 7276140us with duration 3us
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107132 completed at 7276143us
INFO [02/06/2024 09:50:41 AM] Issuing GPU Node ID 107135 (void at::native::vectorized_elementwise_kernel<4, at::native::BUnaryFunctor<c10::BFloat16, c10::BFloat16, c10::BFloat16, at::native::binary_internal::MulFunctor<float> >, at::detail::Array<char*, 2> >(int, at::native::BUnaryFunctor<c10::BFloat16, c10::BFloat16, c10::BFloat16, at::native::binary_internal::MulFunctor<float> >, at::detail::Array<char*, 2>)) at 7276143us with duration 2us
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107135 completed at 7276145us
INFO [02/06/2024 09:50:41 AM] Issuing GPU Node ID 107138 (void at::native::vectorized_elementwise_kernel<4, at::native::CUDAFunctor_add<c10::BFloat16>, at::detail::Array<char*, 3> >(int, at::native::CUDAFunctor_add<c10::BFloat16>, at::detail::Array<char*, 3>)) at 7276145us with duration 3us
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107138 completed at 7276148us
INFO [02/06/2024 09:50:41 AM] Issuing GPU Node ID 107141 (void at::native::vectorized_elementwise_kernel<4, at::native::addcdiv_cuda_kernel(at::TensorIteratorBase&, c10::Scalar const&)::{lambda()#2}::operator()() const::{lambda()#9}::operator()() const::{lambda(c10::BFloat16, c10::BFloat16, c10::BFloat16)#1}, at::detail::Array<char*, 4> >(int, at::native::addcdiv_cuda_kernel(at::TensorIteratorBase&, c10::Scalar const&)::{lambda()#2}::operator()() const::{lambda()#9}::operator()() const::{lambda(c10::BFloat16, c10::BFloat16, c10::BFloat16)#1}, at::detail::Array<char*, 4>)) at 7276148us with duration 3us
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107141 completed at 7276151us
INFO [02/06/2024 09:50:41 AM] Simulation of Chakra node execution completed.

github-actions · 2024-02-06T14:48:17Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

et_converter/pytorch2chakra_converter.py

…encies

TaekyungHeo requested a review from a team as a code owner February 6, 2024 14:48

TaekyungHeo force-pushed the et_converter branch from d31d13f to b56f0b8 Compare February 6, 2024 14:50

TaekyungHeo changed the title ~~Et converter~~ Refactoring PyTorch2Chakra Converter for Better Readability and Dependency Handling Feb 6, 2024

TaekyungHeo changed the title ~~Refactoring PyTorch2Chakra Converter for Better Readability and Dependency Handling~~ Refactoring PyTorch2Chakra converter for better readability and dependency handling Feb 6, 2024

JoongunPark reviewed Feb 6, 2024

View reviewed changes

et_converter/pytorch2chakra_converter.py Show resolved Hide resolved

JoongunPark reviewed Feb 6, 2024

View reviewed changes

et_converter/pytorch2chakra_converter.py Outdated Show resolved Hide resolved

JoongunPark reviewed Feb 6, 2024

View reviewed changes

et_converter/pytorch2chakra_converter.py Outdated Show resolved Hide resolved

JoongunPark reviewed Feb 6, 2024

View reviewed changes

et_converter/pytorch2chakra_converter.py Show resolved Hide resolved

TaekyungHeo force-pushed the et_converter branch from b56f0b8 to c2c7590 Compare February 6, 2024 21:04

TaekyungHeo added 13 commits February 6, 2024 16:07

et_converter: Refactor PyTorch2ChakraConverter

437c2c3

et_converter: Identify cycle dependencies for validation

ca63f65

et_converter: Remove phase end nid tracking

19550c8

et_converter: Remove get_nccl_node

14f513a

et_converter: Remove data dependency identification

266753c

et_converter: Bugfix in dependency construction

55d9692

et_converter: Remove dangling nodes

3e0a941

et_converter: Support inter-thread dependencies

776baa5

et_converter: Support intra-stream dependencies

d3c3cac

et_converter: Fix collective comm identification

6e60317

et_converter: Simulate execution of Chakra nodes based on data depend…

92ee806

…encies

et_converter: Differentiate inclusive and exclusive duration

0e19270

et_converter: Identify non-comm nodes as comp nodes

4b85c07

TaekyungHeo force-pushed the et_converter branch from c2c7590 to 4b85c07 Compare February 6, 2024 21:09

mlcommons deleted a comment from JoongunPark Feb 6, 2024

srinivas212 approved these changes Feb 6, 2024

View reviewed changes

srinivas212 merged commit 5925827 into main Feb 6, 2024
5 checks passed

github-actions bot locked and limited conversation to collaborators Feb 6, 2024

TaekyungHeo deleted the et_converter branch February 6, 2024 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring PyTorch2Chakra converter for better readability and dependency handling #19

Refactoring PyTorch2Chakra converter for better readability and dependency handling #19

TaekyungHeo commented Feb 6, 2024 •

edited

Loading

github-actions bot commented Feb 6, 2024 •

edited

Loading

Refactoring PyTorch2Chakra converter for better readability and dependency handling #19

Refactoring PyTorch2Chakra converter for better readability and dependency handling #19

Conversation

TaekyungHeo commented Feb 6, 2024 • edited Loading

Summary

Test Plan

github-actions bot commented Feb 6, 2024 • edited Loading

TaekyungHeo commented Feb 6, 2024 •

edited

Loading

github-actions bot commented Feb 6, 2024 •

edited

Loading