Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize and vectorize TOMS748 #5578

Merged
merged 11 commits into from
Nov 14, 2023

Conversation

nilsdeppe
Copy link
Member

Proposed changes

I've had this on a branch for a long time, decided to clean it up. This is necessary for vectorizing FixConservatives and primitive recovery. FixConservatives was ~10x after vectorization, primitive recovery I don't remember if I did as thorough of a comparison, but I remember at least a 5x speedup. Our GRMHD simulations (haven't checked GHMHD) spend ~50% of their time in primitive recovery, so hopefully once that's all in we should see a big reduction in runtime there.

Upgrade instructions

Code review checklist

  • The code is documented and the documentation renders correctly. Run
    make doc to generate the documentation locally into BUILD_DIR/docs/html.
    Then open index.html.
  • The code follows the stylistic and code quality guidelines listed in the
    code review guide.
  • The PR lists upgrade instructions and is labeled bugfix or
    new feature if appropriate.

Further comments

@nilsdeppe nilsdeppe requested a review from kidder October 20, 2023 13:30
Copy link
Member

@wthrowe wthrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as a bonus, this should get rid of most of the stupid clang-tidy warnings from Boost.

@@ -6,6 +6,10 @@ option(USE_XSIMD "Use xsimd if it is available" OFF)
if(USE_XSIMD)
find_package(xsimd REQUIRED)

if(xsimd_VERSION VERSION_LESS 11.0.1)
message(FATAL_ERROR "xsimd must be at least 11.0.1, got ${xsimd_VERSION}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update installation requirements documentation.

xsimd
)
# target_link_libraries(
# Blaze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure leaving this was intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add a comment. Basically, I haven't got all bugs out when doing Blaze+XSIMD in our case. I'm not sure if it's some padding assumption or what.

@@ -108,7 +109,8 @@ class Tensor<X, Symm, IndexList<Indices...>> {
std::is_same_v<X, ComplexModalVector> or
std::is_same_v<X, DataVector> or std::is_same_v<X, ModalVector> or
is_spin_weighted_of_v<ComplexDataVector, X> or
is_spin_weighted_of_v<ComplexModalVector, X>,
is_spin_weighted_of_v<ComplexModalVector, X> or
simd::is_batch<X>::value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the error message. (Maybe remove the list of types from it so the information isn't duplicated.)

@@ -6,15 +6,465 @@

#pragma once

#include <boost/math/tools/roots.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[arbitrary line] This is derived from boost code, so you need to include the boost license and copyright statement somewhere.

// `max() * (1. + value)`
const auto mask = fabs(simd::select(mask0, denom, static_cast<T>(0)) *
std::numeric_limits<T>::max()) <= fabs(num);
return simd::select(mask0 and mask, r, num / denom);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does num / denom not FPE here?

Copy link
Member Author

@nilsdeppe nilsdeppe Oct 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not had an issue, but I'll add a comment with code that could be changed to prevent that. This might have something to do with retiring instructions, etc. but SIMD things like that is something I don't have a complete grasp on. If we have issues, it'll be easy to guard against :)

Edit: I just stuck it in... It's one mask, not super expensive to compute.

// Main entry point and logic for Toms Algorithm 748
// root finder.
if (UNLIKELY(simd::any(ax >= bx))) {
throw std::domain_error("Lower bound is larger than upper bound");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would allowing ax == bx cause any problems with the implementation or need special handling? If it would it's probably not worth allowing, but if not it seems mathematically OK (as long as the single-point bracket is a root, of course).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming something subtle breaks in the implementation. The Boost implementation doesn't allow it so I just did that rather than possibly creating some weird edge case

u = simd::select(fabs_fa_less_fabs_fb_mask, a, b);
fu = simd::select(fabs_fa_less_fabs_fb_mask, fa, fb);
const T b_minus_a = b - a;
// need to mask out fb==fa case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO? I don't see any code handling this.

const T b_minus_a = b - a;
// need to mask out fb==fa case
c = simd::fnma(static_cast<T>(2) * (fu / (fb - fa)), b_minus_a, u);
// c = u - static_cast<T>(2) * (fu / (fb - fa)) * b_minus_a;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment or move before the line it's documenting.

* \throws `convergence_error` if, for any index, the requested tolerance is not
* met after `max_iterations` iterations.
p * met after `max_iterations` iterations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops.

cmake/SetupXsimd.cmake Show resolved Hide resolved
@@ -42,6 +42,7 @@ target_link_libraries(
Options
RelativisticEulerSolutions
Serialization
Simd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make PRIVATE and add PRIVATE RootFinding please.

@nilsdeppe
Copy link
Member Author

Pushed fixups. Thanks for the reviews!

Copy link
Member

@wthrowe wthrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other changes look good.

// (C) Copyright John Maddock 2006.
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the actual license file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the file. Much to my surprise it looks like installed versions of Boost also don't necessarily include it, but I agree we should include the license for it.

@nilsdeppe
Copy link
Member Author

I added a couple more fixups that change the way the size() function works and also adds a make_sequence function

@nilsdeppe nilsdeppe force-pushed the simd_toms748 branch 3 times, most recently from 0f17a3a to db87fc6 Compare October 24, 2023 14:03
@nilsdeppe
Copy link
Member Author

@nilsvu could you please take a look at the fixups? @wthrowe I changed a couple small features in the simd lib in the last fixups (and a linking issue) if you have a chance to take a look. The linking issue was what was causing CI to fail.

@wthrowe
Copy link
Member

wthrowe commented Oct 24, 2023

Looks good.

Copy link
Member

@nilsvu nilsvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can squash

@nilsvu
Copy link
Member

nilsvu commented Oct 24, 2023

Any idea why the root finder in the tabulated EOS test failed to converge in the macOS CI test?

@nilsdeppe
Copy link
Member Author

nilsdeppe commented Oct 24, 2023

No, I have no idea why it doesn't work on macOS. I tried reproducing on my machine (Linux) with the same seeds, etc. and can't.

Edit: I'm pushing a fixup commit that should print out more info (tolerances and result) so we can get some idea of how well it did.

@nilsdeppe nilsdeppe force-pushed the simd_toms748 branch 2 times, most recently from 8735fed to eeb8ead Compare October 26, 2023 02:08
@nilsdeppe
Copy link
Member Author

@nilsvu @wthrowe this should be ready now...

Copy link
Member

@wthrowe wthrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a (very short) summary of what you changed since the last time we looked at this? Was it just adjusting tolerances or similar?

f, lower_bound, upper_bound, f_at_lower_bound, f_at_upper_bound, tol,
max_iters);
ignore_filter, max_iters);
if (max_iters >= max_iterations) {
throw convergence_error(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only change this throw to have extra info in it.

@nilsdeppe
Copy link
Member Author

Okay, I moved the file. Sorry, this was a goof on my end. I don't know why I put it in the wrong directory. I then interpreted @nilsvu's comment as "Why not put it inside the header file?" which is clearly not what it asked. Sorry for the confusion!

wthrowe
wthrowe previously approved these changes Oct 27, 2023
@nilsvu
Copy link
Member

nilsvu commented Oct 30, 2023

I ran the tabulated EOS test 1e4 times without fail on macOS (Apple Silicon though, not GitHub's Intel chips). Not sure what caused the convergence issue on CI but I can't reproduce it so can't do anything about it :)

@nilsvu
Copy link
Member

nilsvu commented Oct 30, 2023

Edit: GitHub isn't surfacing this: #5578 (comment)

@nilsdeppe
Copy link
Member Author

Rebased on latest develop since this was a bit old

wthrowe
wthrowe previously approved these changes Nov 8, 2023
@nilsdeppe
Copy link
Member Author

I've pushed a fixup that resolves the random failures in some of the tests like Test_Tabulated3D.cpp. I also pushed a commit that adds several CAPTUREs to test_Tabulated3D.cpp. I rebased as well.

Copy link
Member

@wthrowe wthrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Squash.

@nilsdeppe
Copy link
Member Author

Rebased and squashed. Thanks for the reviews!

@nilsdeppe
Copy link
Member Author

CI failures are:

  • 2x executable build timeouts
  • 1x Burgers Step failure

@nilsdeppe nilsdeppe merged commit 0945e8c into sxs-collaboration:develop Nov 14, 2023
19 of 22 checks passed
@nilsdeppe nilsdeppe deleted the simd_toms748 branch November 14, 2023 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants