You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are couple of limitations in PerfFlowAspect for support AI frameworks
Dynamic graph based input pipeline use internal functions
In case of TensorFlow, the input pipeline defined using tf.data creates a graph of internal Tensorflow functions mixed with user's custom function. An example of such a code is
here internal functions such as tf.data.TFRecordDataset, .shard, or batch cannot be captured using PerfFlowAspect. This is because, this is graph creation and the TensorFlow framework executes these functions on potentially separate threads.
Adding additional application specific args
AI applications have counter on step, epoch,and image idx which can highlight where exactly the bottleneck exists. These are application centric args which are often store in class variables and are hard to handle for global decorators as used by PerfFlowAspect. One approach is to make PerfFlowAspect use a stateful object which contains the decorator and can store/update local class variables/counters in a args dict. This feature will applications filter their timeline on perfetto.ui based on these args.
Profiling Code blocks.
How can we use PerfFlowAspect to profile specific blocks of code in our application? One approach could be to expose a contextual manager with enter and exit functions to enable application developers to use with statements.
The text was updated successfully, but these errors were encountered:
There are couple of limitations in PerfFlowAspect for support AI frameworks
Dynamic graph based input pipeline use internal functions
In case of TensorFlow, the input pipeline defined using
tf.data
creates a graph of internal Tensorflow functions mixed with user's custom function. An example of such a code ishere internal functions such as tf.data.TFRecordDataset, .shard, or batch cannot be captured using PerfFlowAspect. This is because, this is graph creation and the TensorFlow framework executes these functions on potentially separate threads.
Adding additional application specific args
AI applications have counter on step, epoch,and image idx which can highlight where exactly the bottleneck exists. These are application centric args which are often store in class variables and are hard to handle for global decorators as used by PerfFlowAspect. One approach is to make PerfFlowAspect use a stateful object which contains the decorator and can store/update local class variables/counters in a args dict. This feature will applications filter their timeline on perfetto.ui based on these args.
Profiling Code blocks.
How can we use PerfFlowAspect to profile specific blocks of code in our application? One approach could be to expose a contextual manager with enter and exit functions to enable application developers to use
with
statements.The text was updated successfully, but these errors were encountered: