Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hatchet analysis crossvariant #298

Merged

Conversation

jonesholger
Copy link
Contributor

@jonesholger jonesholger commented Jan 23, 2023

Allows cross-variant compares such as RAJA_OpenMP against Base_OpenMP

  • Extracts common subtrees only when number of nodes in the two trees is different
  • Adds a tolerance factor which produces a threshold when multiplied by the baseline inclusive sum
  • Adds a pass/fail check of report - baseline > threshold where threshold = baseline * tolerance (default tolerance = 0.05)
  • Does a metric check using Graphframe .inc_metrics and checks for either 'min#inclusive#sum#time.duration' or its alias 'Min time/rank' found in the newer hatchet
  • Switches out optparse in favor of argparse since optparse is deprecated
  • Propagates the minimum tuning of a set of tunings when extracting common subtrees

Copy link
Member

@adrienbernede adrienbernede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m struggling to understand the implementation of the ExtractCommonSubtree.
At least it had me reading the hatchet documentation.

scripts/gitlab/hatchet-analysis.py Show resolved Hide resolved
Comment on lines +30 to +31
generic_exc_metrics = gf.exc_metrics
generic_inc_metrics = gf.inc_metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can pass those in the final init directly ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those metrics are new class variables which are copied over from graphframe argument, and eventually used as arguments to initialize the super class super().init, you can see that this initialization is all copies, sometime explicit as in generic_dataframe=gf.dataframe.copy(). Which allows me to create a new graphframe in this type of call

gf1 = GenericFrame(ht.GraphFrame.from_caliperreader(f1))

what GenericFrame is doing is renaming the root node to name "Variant" and setting all the attributes that makes it a "Node". This will allow Hatchet to compare completely different trees (since the root node itself is different), while assuming that the tree structure underneath in the two trees are completely identical. If they are not identical, then we extractCommonSubtree so we can compare apples to apples

generic_graph = gf.graph.copy()
generic_exc_metrics = gf.exc_metrics
generic_inc_metrics = gf.inc_metrics
generic_default_metric = gf.default_metric # in newer Hatchet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they are used in the super call
super().init(generic_graph, generic_dataframe, generic_exc_metrics, generic_inc_metrics)

Are you talking about the default metric? We don't use that in ExtractCommonSubtree anymore since the metric can take one of two forms

#metric = "sum#inclusive#sum#time.duration"
metric = "Min time/rank"

The first is legacy metric which still exists BTW within Caliper and is used in all of the timing calculations, while the second form is a Human readable alias but processed as first class citizen within Hatchet

But we keep the Default metric as a class variable to maintain self similar structure to the inherited Graphframe.

Comment on lines +34 to +40
ii = generic_dataframe.index[0]
# fr = ht.frame.Frame({'name': 'Variant', 'type' : 'region'})
fr = ht.graphframe.Frame({'name': 'Variant', 'type': 'region'})
nn = ht.graphframe.Node(fr)
setattr(nn, '_hatchet_nid', ii._hatchet_nid)
setattr(nn, '_depth', ii._depth)
setattr(nn, 'children', ii.children)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what is this function doing exactly ? I get that it’s sort of re-initializing the dataframe, but how and why ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained above but I should illustrate the rationale with actual code and output. Say I don't use GenericFrame and instead call Hatchet directly on Base_Seq.cali and RAJA_Seq.cali like so

#!/usr/bin/env python3
import hatchet as ht
gf1 = ht.GraphFrame.from_caliperreader("RAJA_Seq.cali")
gf2 = ht.GraphFrame.from_caliperreader("Base_Seq.cali")

gf3 = gf2/gf1

print(gf3.tree())

and output (partial screen shots)

left frame i.e gf2 (with red arrows suggesting this only occurs in left frame)
image

and the right frame gf1 (with green arows suggesting only in the right frame but also showing up as NANs)

image

Now add in GenericFrame like so

metric = "Min time/rank"
gf4 = GenericFrame(gf1)
gf5 = GenericFrame(gf2)
gf6 = gf5/gf4
print(gf6.tree(metric_column=metric))

with it's screenshot showing the two variants are slightly different with RAJA incurring more overhead as expected
image

Success!

Comment on lines +64 to +66
if nn._depth == 3:
if common_subtree.dataframe.loc[nn, metric] < m3:
m3 = common_subtree.dataframe.loc[nn, metric]
Copy link
Member

@adrienbernede adrienbernede Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able extract the purpose of this.

Propagates the minimum tuning of a set of tunings when extracting common subtrees

It looks like nodes with depth > 3 are ignored.
But then, a post traversal will go through all the level 3 nodes of a given level 2 node. During this phase, m3 with be set to the minimal value encountered. What does "metric" (or tuning) represent and why extracting the minimum ?

The rest appears to be accumulating values from higher to lower levels.

Copy link
Contributor Author

@jonesholger jonesholger Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the screenshot above the tunings for Algorithm_MEMCPY are default and library, so when we compare different subtrees the leaf nodes will have different names for the tunings for the respective variant, so the best value of the tuning set is propagated up the tree, which is the minimum time. It doesn't make sense when comparing RAJA_Seq to RAJA_CUDA where the CUDA algorithms could be run with a bunch of different block sizes (so lots of tunings) and Seq only has default. Which one to compare then .. we prefer to propagate the minimum in both trees. Otherwise the trees would be too dissimilar and Hatchet again will gives red/green arrows suggesting tree too different at subtree, and nonsense will be propagated. In Caliper an algorithm.tuning is the actual algorithm getting timed. Each algorithm.tuning is really an independent algorithm with a new name which we designate as algorithm.tuning.

s2 = m3
s1 += s2
common_subtree.dataframe.loc[nn, metric] = s2
m3 = sys.float_info.max
Copy link
Member

@adrienbernede adrienbernede Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a level 2 node with no child, it looks like this node will get s2 = sys.float_info.max.
I must be missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we're just resetting m3 for the next pass. When you do a reduction (i.e for minimum) you initialize the values to float max, so the next value seen must be less than that, the subsequent values will be compared to that new minimum. When we arrive at nn._depth == 2, we're done with level 3 for that algorithm, so m3 gets reset. Since we rejected some tunings that were not the minimum we now have to redo all of the timing up the tree until we get to the root. Everything that is not at level 3 is summed like it would be originally.

What you're seeing at s2 = m3 is the result of the tree traversal at the prior level, the tree traversal will do all of the level 3 nodes before depth is level 2, so s2 will be set to the minimum value from level 3, not float max. The Caliper tree structure is
image

Every algorithm has at least one tuning, usually default, so there is no childless node at level 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in general there are no node levels > 3, since the tree structure stops at level 3. Technically, however - there could be a case where MPI barrier timing is injected via additional service setup, and possibly see column metrics at level 4, which we currently don't care about. They'll be summed into the tuning at level 3 by Caliper anyways and are usually miniscule in the grand scheme of things.

Copy link
Member

@adrienbernede adrienbernede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jonesholger for the detailed explanation.

In fact, I think that your comments would make a great piece of documentation for this script. This script will be read by developers wondering why the associated job in the CI failed, but requires some knowledge of both Caliper 3 level output, and Hatchet tree structure management, which makes this script not so easy to decipher.

@adrienbernede
Copy link
Member

Note that I haven’t been able to access LC GitLab lately to investigate the failures.

@adrienbernede adrienbernede merged commit 3993588 into woptim/caliper-integration Mar 8, 2023
@adrienbernede adrienbernede deleted the hatchet-analysis-crossvariant branch March 8, 2023 14:46
@jonesholger
Copy link
Contributor Author

Thank you @adrienbernede for your comments regarding providing documentation for the script. I should take this to heart, but also consider adding some of this to the RAJAPerf tutorial that I'm currently working on. I know you're super busy, but you are becoming quite adept at RAJAPerf, and I was hoping to get you to Review my tutorial at some point when I get most everything fleshed out. Do you have or use Docker in a "local" context? If not I'm considering putting it all on BinderHub which will launch Jupyterlab automatically with the entire environment ready to go.

In addition I'm lobbying to get Hatchet improved so we don't have to use those support modules in order to compare across variants. But, that is a much longer conversation. There is also the counter argument for me to change RAJAPerf-Caliper to fit modern Hatchet, but this will increase the internal complexity and decrease robustness of the Caliper implementation which lives in the Base class - so I'm resisting. Besides I think this sort of scripting is best left to Python vs C++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants