-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hatchet analysis crossvariant #298
Hatchet analysis crossvariant #298
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m struggling to understand the implementation of the ExtractCommonSubtree.
At least it had me reading the hatchet documentation.
generic_exc_metrics = gf.exc_metrics | ||
generic_inc_metrics = gf.inc_metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can pass those in the final init directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those metrics are new class variables which are copied over from graphframe argument, and eventually used as arguments to initialize the super class super().init, you can see that this initialization is all copies, sometime explicit as in generic_dataframe=gf.dataframe.copy(). Which allows me to create a new graphframe in this type of call
gf1 = GenericFrame(ht.GraphFrame.from_caliperreader(f1))
what GenericFrame is doing is renaming the root node to name "Variant" and setting all the attributes that makes it a "Node". This will allow Hatchet to compare completely different trees (since the root node itself is different), while assuming that the tree structure underneath in the two trees are completely identical. If they are not identical, then we extractCommonSubtree so we can compare apples to apples
generic_graph = gf.graph.copy() | ||
generic_exc_metrics = gf.exc_metrics | ||
generic_inc_metrics = gf.inc_metrics | ||
generic_default_metric = gf.default_metric # in newer Hatchet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they are used in the super call
super().init(generic_graph, generic_dataframe, generic_exc_metrics, generic_inc_metrics)
Are you talking about the default metric? We don't use that in ExtractCommonSubtree anymore since the metric can take one of two forms
#metric = "sum#inclusive#sum#time.duration"
metric = "Min time/rank"
The first is legacy metric which still exists BTW within Caliper and is used in all of the timing calculations, while the second form is a Human readable alias but processed as first class citizen within Hatchet
But we keep the Default metric as a class variable to maintain self similar structure to the inherited Graphframe.
ii = generic_dataframe.index[0] | ||
# fr = ht.frame.Frame({'name': 'Variant', 'type' : 'region'}) | ||
fr = ht.graphframe.Frame({'name': 'Variant', 'type': 'region'}) | ||
nn = ht.graphframe.Node(fr) | ||
setattr(nn, '_hatchet_nid', ii._hatchet_nid) | ||
setattr(nn, '_depth', ii._depth) | ||
setattr(nn, 'children', ii.children) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what is this function doing exactly ? I get that it’s sort of re-initializing the dataframe, but how and why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explained above but I should illustrate the rationale with actual code and output. Say I don't use GenericFrame and instead call Hatchet directly on Base_Seq.cali and RAJA_Seq.cali like so
#!/usr/bin/env python3
import hatchet as ht
gf1 = ht.GraphFrame.from_caliperreader("RAJA_Seq.cali")
gf2 = ht.GraphFrame.from_caliperreader("Base_Seq.cali")
gf3 = gf2/gf1
print(gf3.tree())
and output (partial screen shots)
left frame i.e gf2 (with red arrows suggesting this only occurs in left frame)
and the right frame gf1 (with green arows suggesting only in the right frame but also showing up as NANs)
Now add in GenericFrame like so
metric = "Min time/rank"
gf4 = GenericFrame(gf1)
gf5 = GenericFrame(gf2)
gf6 = gf5/gf4
print(gf6.tree(metric_column=metric))
with it's screenshot showing the two variants are slightly different with RAJA incurring more overhead as expected
Success!
if nn._depth == 3: | ||
if common_subtree.dataframe.loc[nn, metric] < m3: | ||
m3 = common_subtree.dataframe.loc[nn, metric] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able extract the purpose of this.
Propagates the minimum tuning of a set of tunings when extracting common subtrees
It looks like nodes with depth > 3 are ignored.
But then, a post traversal will go through all the level 3 nodes of a given level 2 node. During this phase, m3 with be set to the minimal value encountered. What does "metric" (or tuning) represent and why extracting the minimum ?
The rest appears to be accumulating values from higher to lower levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the screenshot above the tunings for Algorithm_MEMCPY are default and library, so when we compare different subtrees the leaf nodes will have different names for the tunings for the respective variant, so the best value of the tuning set is propagated up the tree, which is the minimum time. It doesn't make sense when comparing RAJA_Seq to RAJA_CUDA where the CUDA algorithms could be run with a bunch of different block sizes (so lots of tunings) and Seq only has default. Which one to compare then .. we prefer to propagate the minimum in both trees. Otherwise the trees would be too dissimilar and Hatchet again will gives red/green arrows suggesting tree too different at subtree, and nonsense will be propagated. In Caliper an algorithm.tuning is the actual algorithm getting timed. Each algorithm.tuning is really an independent algorithm with a new name which we designate as algorithm.tuning.
s2 = m3 | ||
s1 += s2 | ||
common_subtree.dataframe.loc[nn, metric] = s2 | ||
m3 = sys.float_info.max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a level 2 node with no child, it looks like this node will get s2 = sys.float_info.max.
I must be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we're just resetting m3 for the next pass. When you do a reduction (i.e for minimum) you initialize the values to float max, so the next value seen must be less than that, the subsequent values will be compared to that new minimum. When we arrive at nn._depth == 2, we're done with level 3 for that algorithm, so m3 gets reset. Since we rejected some tunings that were not the minimum we now have to redo all of the timing up the tree until we get to the root. Everything that is not at level 3 is summed like it would be originally.
What you're seeing at s2 = m3 is the result of the tree traversal at the prior level, the tree traversal will do all of the level 3 nodes before depth is level 2, so s2 will be set to the minimum value from level 3, not float max. The Caliper tree structure is
Every algorithm has at least one tuning, usually default, so there is no childless node at level 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in general there are no node levels > 3, since the tree structure stops at level 3. Technically, however - there could be a case where MPI barrier timing is injected via additional service setup, and possibly see column metrics at level 4, which we currently don't care about. They'll be summed into the tuning at level 3 by Caliper anyways and are usually miniscule in the grand scheme of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jonesholger for the detailed explanation.
In fact, I think that your comments would make a great piece of documentation for this script. This script will be read by developers wondering why the associated job in the CI failed, but requires some knowledge of both Caliper 3 level output, and Hatchet tree structure management, which makes this script not so easy to decipher.
Note that I haven’t been able to access LC GitLab lately to investigate the failures. |
Thank you @adrienbernede for your comments regarding providing documentation for the script. I should take this to heart, but also consider adding some of this to the RAJAPerf tutorial that I'm currently working on. I know you're super busy, but you are becoming quite adept at RAJAPerf, and I was hoping to get you to Review my tutorial at some point when I get most everything fleshed out. Do you have or use Docker in a "local" context? If not I'm considering putting it all on BinderHub which will launch Jupyterlab automatically with the entire environment ready to go. In addition I'm lobbying to get Hatchet improved so we don't have to use those support modules in order to compare across variants. But, that is a much longer conversation. There is also the counter argument for me to change RAJAPerf-Caliper to fit modern Hatchet, but this will increase the internal complexity and decrease robustness of the Caliper implementation which lives in the Base class - so I'm resisting. Besides I think this sort of scripting is best left to Python vs C++ |
Allows cross-variant compares such as RAJA_OpenMP against Base_OpenMP