Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce compilation memory of executables #5647

Merged

Conversation

macedo22
Copy link
Contributor

@macedo22 macedo22 commented Dec 4, 2023

Addresses #5472

Proposed changes

This PR includes several changes that were found to decrease peak RAM usage during compilation of EvolveGhBinaryBlackHole. See Further comments below for details on the compilation impact of each commit and overall.

Upgrade instructions

Code review checklist

  • The code is documented and the documentation renders correctly. Run
    make doc to generate the documentation locally into BUILD_DIR/docs/html.
    Then open index.html.
  • The code follows the stylistic and code quality guidelines listed in the
    code review guide.
  • The PR lists upgrade instructions and is labeled bugfix or
    new feature if appropriate.

Further comments

Effects of changes on compiling EvolveGhBinaryBlackHole:
The original commits that generated the stats below are backed up on this branch, where clang-13 Release and gcc-9 Release were used to compile on wheeler014.

Each commit below is applied on top of the one above it, so the relative effect of one commit applied on top of the ones before it is the difference from the commit above it. For example, applying 32a2f1d after 9769906 saved an additional 0.2 GB of RAM for clang and 1.54 GB for gcc.

Modification                                    | Commit hash | clang peak RAM (GB) | gcc peak RAM (GB) | clang user time (mm:ss) | gcc user time (mm:ss)
---------------------------------------------------------------------------------------------------------------------------------------------------------
(baseline)                                        b479ee1       11.36                 19.00                8:32                     13:50
Factor ObserveNorms::operator()                   9769906       11.15                 18.84                8:07                     13:22
Rework Databox::reset_compute_items_after_mutate  32a2f1d       10.95                 17.30                8:05                     13:37
Factor get_template_parameters_as_string          8bb55a9       10.85                 17.14                7:52                     13:38
Rework Options help text printing                 3448d08       10.69                 16.38                7:49                     13:21
Factor Options::Parser::parse                     3ad5784        9.98                 15.60                6:55                     11:04
---------------------------------------------------------------------------------------------------------------------------------------------------------
Total saved from baseline commit                                 1.38                  3.40                1:37                      2:46

Effects of changes on CI:
Some build times and the sizes of the build directories improved a little. See effects on CI recorded here.

Copy link
Member

@wthrowe wthrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. I would not have guessed most of that would matter. (The DataBox one makes sense in hindsight, at least.)

My only suggested change is optional, so if you don't want to make it let me know and I'll approve.

tmpl::get_source<tmpl::_1>>>...>,
tmpl::filter<
typename DataBox<tmpl::list<Tags...>>::edge_list,
tmpl::bind<tmpl::list_contains,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] I recently added a tmpl::lazy::list_contains, if you want to avoid the tmpl::bind. I don't know whether bind has a performance impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compiled with this change it seemed to be the same or ever so slightly less RAM (but the difference could just be noise), so I just threw it in

@knelli2 knelli2 self-requested a review December 4, 2023 17:44
@knelli2 knelli2 added the priority critical for progress label Dec 4, 2023
knelli2
knelli2 previously approved these changes Dec 4, 2023
Copy link
Contributor

@knelli2 knelli2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also surprised how much moving the PARSE_ERRORs from ParseOptions.hpp to the cpp makes a difference.

@macedo22
Copy link
Contributor Author

macedo22 commented Dec 5, 2023

Applied suggested fixup and rebased!

knelli2
knelli2 previously approved these changes Dec 5, 2023
Copy link
Contributor

@knelli2 knelli2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also edit the title of the PR? Your changes here will actually benefit all of our executables, not just the BBH one.

wthrowe
wthrowe previously approved these changes Dec 5, 2023
Copy link
Member

@nilsvu nilsvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the changes in CI runtime that you mentioned in the PR description: Keep in mind that CI runtimes depend heavily on caches from previous runs. To measure it accurately you could (1) delete all your GitHub Actions caches on your fork, (2) push current develop to a new branch (not called "develop"), (3) push your changes to a new branch. CI will run on both new branches without using caches. (I'm not requesting you do this, just suggesting a method to measure the CI runtime accurately)

const std::string tensor_name = db::tag_name<TensorToObserveTag>();
// Loop over ObservableTensorTags and see if it was requested to be observed.
// This approach allows us to delay evaluating any compute tags until they're
// actually needed for observing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should remain at the ObservableTensorTags pack expansion below.

Is there actually any benefit in factoring out this impl function? It's still templated on the whole box and on the ObservableTensorTags. If it makes no compile-time difference then I'd keep this code inlined in the tmpl::for_each lambda function below to reduce complexity.

Copy link
Contributor Author

@macedo22 macedo22 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the lambda, I pulled it out because it was a big part of the memory savings for the ObserveNorms changes relative to its total savings, though the total savings weren't huge. With the current change I have (pulling out the tmpl::for_each into a function AND factoring out check_norm_is_observable and fill_norm_values_and_names), it saved 0.21 GB and 24.5s for clang-13 Release and 0.17GB and 27.9s for gcc-9 Release. Now if I put the tmpl::for_each back in (but still keep check_norm_is_observable and fill_norm_values_and_names factored out), it adds back 0.08GB (of 0.21GB) and 13s (of 24.5s) for clang and 0.12 GB (of 0.17GB) and 14s (of 27.9s) for gcc. (note: The timing may have some noise, I didn't run it multiple times to get an average)

Let me know how you feel about the tradeoff, but I will move the comment back either way!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep as-is. the tmpl::for_each would be a 50%+ regression, which is huge.

Copy link
Member

@nilsvu nilsvu Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Sounds like tmpl::for_each is a lot more expensive than a parameter pack expansion. Might be an avenue for more optimizations in other places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to merge this now, go ahead and move the comment back to the right place in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I moved the comment back down

@nilsdeppe
Copy link
Member

Please don't spend time trying to measure CI. That's a lot of effort for specific numbers that I don't think we need. The results are quite clear: this makes major improvements. I think the effort is better spent getting this merged and working on more improvements.

Done to reduce compile time and compile memory usage
Done to reduce memory usage during compilation
Done to reduce memory usage during compilation
Done to reduce memory usage during compilation
Done to reduce compile time and compile memory usage
@macedo22 macedo22 dismissed stale reviews from wthrowe and knelli2 via 3c03848 December 6, 2023 22:23
@macedo22 macedo22 force-pushed the bbh_compilation_improvements_1 branch from 268ddc5 to 3c03848 Compare December 6, 2023 22:23
@macedo22 macedo22 requested a review from nilsvu December 6, 2023 22:24
@macedo22 macedo22 changed the title Reduce compilation memory of EvolveGhBinaryBlackHole Reduce compilation memory of executables Dec 6, 2023
@nilsvu nilsvu merged commit a26db87 into sxs-collaboration:develop Dec 7, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement priority critical for progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants