-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce compilation memory of executables #5647
Reduce compilation memory of executables #5647
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool. I would not have guessed most of that would matter. (The DataBox one makes sense in hindsight, at least.)
My only suggested change is optional, so if you don't want to make it let me know and I'll approve.
tmpl::get_source<tmpl::_1>>>...>, | ||
tmpl::filter< | ||
typename DataBox<tmpl::list<Tags...>>::edge_list, | ||
tmpl::bind<tmpl::list_contains, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] I recently added a tmpl::lazy::list_contains
, if you want to avoid the tmpl::bind
. I don't know whether bind has a performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compiled with this change it seemed to be the same or ever so slightly less RAM (but the difference could just be noise), so I just threw it in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also surprised how much moving the PARSE_ERROR
s from ParseOptions.hpp
to the cpp makes a difference.
3ad5784
to
268ddc5
Compare
Applied suggested fixup and rebased! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also edit the title of the PR? Your changes here will actually benefit all of our executables, not just the BBH one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the changes in CI runtime that you mentioned in the PR description: Keep in mind that CI runtimes depend heavily on caches from previous runs. To measure it accurately you could (1) delete all your GitHub Actions caches on your fork, (2) push current develop to a new branch (not called "develop"), (3) push your changes to a new branch. CI will run on both new branches without using caches. (I'm not requesting you do this, just suggesting a method to measure the CI runtime accurately)
const std::string tensor_name = db::tag_name<TensorToObserveTag>(); | ||
// Loop over ObservableTensorTags and see if it was requested to be observed. | ||
// This approach allows us to delay evaluating any compute tags until they're | ||
// actually needed for observing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should remain at the ObservableTensorTags
pack expansion below.
Is there actually any benefit in factoring out this impl
function? It's still templated on the whole box and on the ObservableTensorTags
. If it makes no compile-time difference then I'd keep this code inlined in the tmpl::for_each
lambda function below to reduce complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the lambda, I pulled it out because it was a big part of the memory savings for the ObserveNorms
changes relative to its total savings, though the total savings weren't huge. With the current change I have (pulling out the tmpl::for_each
into a function AND factoring out check_norm_is_observable
and fill_norm_values_and_names
), it saved 0.21 GB and 24.5s for clang-13 Release and 0.17GB and 27.9s for gcc-9 Release. Now if I put the tmpl::for_each
back in (but still keep check_norm_is_observable
and fill_norm_values_and_names
factored out), it adds back 0.08GB (of 0.21GB) and 13s (of 24.5s) for clang and 0.12 GB (of 0.17GB) and 14s (of 27.9s) for gcc. (note: The timing may have some noise, I didn't run it multiple times to get an average)
Let me know how you feel about the tradeoff, but I will move the comment back either way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep as-is. the tmpl::for_each
would be a 50%+ regression, which is huge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Sounds like tmpl::for_each
is a lot more expensive than a parameter pack expansion. Might be an avenue for more optimizations in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to merge this now, go ahead and move the comment back to the right place in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I moved the comment back down
Please don't spend time trying to measure CI. That's a lot of effort for specific numbers that I don't think we need. The results are quite clear: this makes major improvements. I think the effort is better spent getting this merged and working on more improvements. |
Done to reduce compile time and compile memory usage
Done to reduce memory usage during compilation
Done to reduce memory usage during compilation
Done to reduce memory usage during compilation
Done to reduce compile time and compile memory usage
268ddc5
to
3c03848
Compare
Addresses #5472
Proposed changes
This PR includes several changes that were found to decrease peak RAM usage during compilation of
EvolveGhBinaryBlackHole
. See Further comments below for details on the compilation impact of each commit and overall.Upgrade instructions
Code review checklist
make doc
to generate the documentation locally intoBUILD_DIR/docs/html
.Then open
index.html
.code review guide.
bugfix
ornew feature
if appropriate.Further comments
Effects of changes on compiling
EvolveGhBinaryBlackHole
:The original commits that generated the stats below are backed up on this branch, where clang-13 Release and gcc-9 Release were used to compile on wheeler014.
Each commit below is applied on top of the one above it, so the relative effect of one commit applied on top of the ones before it is the difference from the commit above it. For example, applying
32a2f1d
after9769906
saved an additional 0.2 GB of RAM for clang and 1.54 GB for gcc.Effects of changes on CI:
Some build times and the sizes of the build directories improved a little. See effects on CI recorded here.