Force inline par_for_inner #967

jdolence · 2023-10-24T18:17:56Z

PR Summary

This PR just changes KOKKOS_INLINE_FUNCTION to KOKKOS_FORCEINLINE_FUNCTION for all of our par_for_inner overloads.

In a downstream code, we found that a particular loop was failing to vectorize when using the par_for_inner that corresponds to a single simd for loop, whereas just using the raw simd for loop directly resulted in vectorization (and a 40% speedup of the whole code!). Changing INLINE to FORCEINLINE on the par_for_inner resolves this, suggesting that the compiler was making the obnoxious choice not to inline and then (presumably) failing to vectorize what I can only guess it thought was a function call.

PR Checklist

Code passes cpplint
New features are documented.
Adds a test for any bugs fixed. Adds tests for new features.
Code is formatted
Changes are summarized in CHANGELOG.md
CI has been triggered on Darwin for performance regression tests.
Docs build
(@lanl.gov employees) Update copyright on changed files

pdmullen

I approve, although I expect you may get some pushback from others in the collab 😉

lroberts36

LGTM. Amazed that this change provides a 40% speedup.

jdolence · 2023-10-24T18:50:45Z

I approve, although I expect you may get some pushback from others in the collab 😉

@pgrete any objections?

Yurlungur

This is one of the wackier failures of the compiler I've seen.

Yurlungur · 2023-10-24T19:39:03Z

This seems like a very trivial change in keeping with the original intent, so I'm pressing the button

pgrete · 2023-10-27T12:18:09Z

IIRC the original motivation why we went away from force inline (what we had originally) to just inline was that the (now legacy) Intel compiler was not able to compile the code any more.
Might be worth double checking.

Yurlungur · 2023-10-27T14:48:45Z

IIRC the original motivation why we went away from force inline (what we had originally) to just inline was that the (now legacy) Intel compiler was not able to compile the code any more. Might be worth double checking.

@jdolence did you run your tests of this performance with legacy intel?

jdolence added 2 commits October 24, 2023 12:10

INLINE to FORCEINLINE on all par_for_inner functions

081b14c

update changelog

5145bd8

jdolence changed the title ~~Force inline par_for_inner~~ WIP: Force inline par_for_inner Oct 24, 2023

jdolence changed the title ~~WIP: Force inline par_for_inner~~ Force inline par_for_inner Oct 24, 2023

jdolence requested review from lroberts36, jonahm-LANL, pgrete, bprather and pdmullen October 24, 2023 18:40

pdmullen approved these changes Oct 24, 2023

View reviewed changes

update copyright

790cebb

lroberts36 approved these changes Oct 24, 2023

View reviewed changes

Yurlungur approved these changes Oct 24, 2023

View reviewed changes

Yurlungur enabled auto-merge October 24, 2023 19:38

Yurlungur merged commit abfae20 into develop Oct 24, 2023
49 checks passed

bprather mentioned this pull request Nov 7, 2023

Forceinline par_for_inner: add meshblock version #972

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force inline par_for_inner #967

Force inline par_for_inner #967

jdolence commented Oct 24, 2023 •

edited

Loading

pdmullen left a comment

lroberts36 left a comment

jdolence commented Oct 24, 2023

Yurlungur left a comment

Yurlungur commented Oct 24, 2023

pgrete commented Oct 27, 2023

Yurlungur commented Oct 27, 2023

Force inline par_for_inner #967

Force inline par_for_inner #967

Conversation

jdolence commented Oct 24, 2023 • edited Loading

PR Summary

PR Checklist

pdmullen left a comment

Choose a reason for hiding this comment

lroberts36 left a comment

Choose a reason for hiding this comment

jdolence commented Oct 24, 2023

Yurlungur left a comment

Choose a reason for hiding this comment

Yurlungur commented Oct 24, 2023

pgrete commented Oct 27, 2023

Yurlungur commented Oct 27, 2023

jdolence commented Oct 24, 2023 •

edited

Loading