Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerfFlowAspect for Tensorflow applications #95

Open
hariharan-devarajan opened this issue Mar 27, 2023 · 0 comments
Open

PerfFlowAspect for Tensorflow applications #95

hariharan-devarajan opened this issue Mar 27, 2023 · 0 comments

Comments

@hariharan-devarajan
Copy link
Contributor

hariharan-devarajan commented Mar 27, 2023

There are couple of limitations in PerfFlowAspect for support AI frameworks

Dynamic graph based input pipeline use internal functions

In case of TensorFlow, the input pipeline defined using tf.data creates a graph of internal Tensorflow functions mixed with user's custom function. An example of such a code is

self._dataset = tf.data.TFRecordDataset(filenames=_file_list, buffer_size=self._args.transfer_size)
self._dataset = self._dataset.shard(num_shards=self._args.comm_size, index=self._args.my_rank)
self._dataset = self._dataset.map(self._tf_parse_function, num_parallel_calls=self._args.computation_threads)
self._dataset = self._dataset.map(self._decode_image, num_parallel_calls=self._args.computation_threads)
self._dataset = self._dataset.batch(batch_size, drop_remainder=True)

here internal functions such as tf.data.TFRecordDataset, .shard, or batch cannot be captured using PerfFlowAspect. This is because, this is graph creation and the TensorFlow framework executes these functions on potentially separate threads.

Adding additional application specific args

AI applications have counter on step, epoch,and image idx which can highlight where exactly the bottleneck exists. These are application centric args which are often store in class variables and are hard to handle for global decorators as used by PerfFlowAspect. One approach is to make PerfFlowAspect use a stateful object which contains the decorator and can store/update local class variables/counters in a args dict. This feature will applications filter their timeline on perfetto.ui based on these args.

Profiling Code blocks.

How can we use PerfFlowAspect to profile specific blocks of code in our application? One approach could be to expose a contextual manager with enter and exit functions to enable application developers to use with statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant