Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring PyTorch2Chakra converter for better readability and dependency handling #19

Merged
merged 13 commits into from
Feb 6, 2024

Conversation

TaekyungHeo
Copy link
Contributor

@TaekyungHeo TaekyungHeo commented Feb 6, 2024

Summary

This pull request significantly refactors the et_converter module, with a particular focus on enhancing the PyTorch2Chakra converter:

  • Shifted towards an object-oriented approach for improved readability and maintainability.
  • Revised the logic for data dependency identification:
    • Moved away from using tensor input-output relationships.
    • Now follows control dependencies to accurately encode data dependencies.
  • Introduced utility functions for:
    • Cycle detection to validate dependencies.
    • Execution simulation for easier simulation of final Chakra traces.
  • Enhanced documentation through detailed logging and comments to explain the conversion process from traces to final Chakra traces, addressing previous bugs and limitations.

Test Plan

$ cd ~/param
$ cd param/train/comms/pt
$ pip install .
$ cd ../../compute/python
$ pip install -r requirements.txt
$ python setup.py install
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_0.json --kineto-file ~/llama_kineto/worker0_step_12.1697596714999.pt.trace.json --output-file ~/rank0.json

$ cd ~/charka
$ pip install .
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank0.json --output_filename ~/rank0.chakra --num_dims 1
$ tail debug.log
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107127 completed at 7276140us
INFO [02/06/2024 09:50:41 AM] Issuing GPU Node ID 107132 (void at::native::vectorized_elementwise_kernel<4, at::native::sqrt_kernel_cuda(at::TensorIteratorBase&)::{lambda()#2}::operator()() const::{lambda()#4}::operator()() const::{lambda(c10::BFloat16)#1}, at::detail::Array<char*, 2> >(int, at::native::sqrt_kernel_cuda(at::TensorIteratorBase&)::{lambda()#2}::operator()() const::{lambda()#4}::operator()() const::{lambda(c10::BFloat16)#1}, at::detail::Array<char*, 2>)) at 7276140us with duration 3us
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107132 completed at 7276143us
INFO [02/06/2024 09:50:41 AM] Issuing GPU Node ID 107135 (void at::native::vectorized_elementwise_kernel<4, at::native::BUnaryFunctor<c10::BFloat16, c10::BFloat16, c10::BFloat16, at::native::binary_internal::MulFunctor<float> >, at::detail::Array<char*, 2> >(int, at::native::BUnaryFunctor<c10::BFloat16, c10::BFloat16, c10::BFloat16, at::native::binary_internal::MulFunctor<float> >, at::detail::Array<char*, 2>)) at 7276143us with duration 2us
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107135 completed at 7276145us
INFO [02/06/2024 09:50:41 AM] Issuing GPU Node ID 107138 (void at::native::vectorized_elementwise_kernel<4, at::native::CUDAFunctor_add<c10::BFloat16>, at::detail::Array<char*, 3> >(int, at::native::CUDAFunctor_add<c10::BFloat16>, at::detail::Array<char*, 3>)) at 7276145us with duration 3us
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107138 completed at 7276148us
INFO [02/06/2024 09:50:41 AM] Issuing GPU Node ID 107141 (void at::native::vectorized_elementwise_kernel<4, at::native::addcdiv_cuda_kernel(at::TensorIteratorBase&, c10::Scalar const&)::{lambda()#2}::operator()() const::{lambda()#9}::operator()() const::{lambda(c10::BFloat16, c10::BFloat16, c10::BFloat16)#1}, at::detail::Array<char*, 4> >(int, at::native::addcdiv_cuda_kernel(at::TensorIteratorBase&, c10::Scalar const&)::{lambda()#2}::operator()() const::{lambda()#9}::operator()() const::{lambda(c10::BFloat16, c10::BFloat16, c10::BFloat16)#1}, at::detail::Array<char*, 4>)) at 7276148us with duration 3us
INFO [02/06/2024 09:50:41 AM] GPU Node ID 107141 completed at 7276151us
INFO [02/06/2024 09:50:41 AM] Simulation of Chakra node execution completed.

@TaekyungHeo TaekyungHeo requested a review from a team as a code owner February 6, 2024 14:48
Copy link

github-actions bot commented Feb 6, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@TaekyungHeo TaekyungHeo changed the title Et converter Refactoring PyTorch2Chakra Converter for Better Readability and Dependency Handling Feb 6, 2024
@TaekyungHeo TaekyungHeo changed the title Refactoring PyTorch2Chakra Converter for Better Readability and Dependency Handling Refactoring PyTorch2Chakra converter for better readability and dependency handling Feb 6, 2024
@srinivas212 srinivas212 merged commit 5925827 into main Feb 6, 2024
5 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 6, 2024
@TaekyungHeo TaekyungHeo deleted the et_converter branch February 6, 2024 23:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants