Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation #672

msimberg · 2023-05-04T13:08:19Z

Fixes #517, i.e. uses a non-yielding everywhere where a spinlock was used previously. The implementations were almost identical, but used a slightly different yielding strategy. The yielding one (pika::spinlock) used yield_while allowing yielding of the user-level thread. The non-yielding one (pika::concurrency::detail::spinlock) was sleeping the OS-thread after one iteration. This now uses the pika::spinlock strategy everywhere, except that yielding is disallowed when spinning. The name pika::spinlock is removed and only the detail one remains.

Note that we used to have three spinlock implementations. Now we still have two. The remaining one is a very basic one with near zero dependencies (no lock registration, no ITT support) that is still used in things like the configuration maps and resource partitioner.

This doesn't change performance in either direction in DLA-Future, but it's a prerequisite for having a semi-blocking barrier (eth-cscs/DLA-Future#833).

We will need to make corresponding changes in pika-algorithms since it uses pika::spinlock in a few places.

msimberg · 2023-05-04T13:08:29Z

bors try

pika-bot · 2023-05-04T13:18:21Z

Performance test report

pika Performance

Comparison

BENCHMARK	NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch	---

Info

Property	Before	After
pika Commit	`97100fb`	`4202bc6`
pika Datetime	2023-04-24T15:38:14+00:00	2023-05-04T13:08:21+00:00
Datetime	2023-04-24T17:46:47.209658+02:00	2023-05-04T15:17:53.243905+02:00
Compiler	/apps/daint/SSL/pika/spack/lib/spack/env/clang/clang++ 11.0.1	/apps/daint/SSL/pika/spack/lib/spack/env/clang/clang++ 11.0.1
Hostname	nid00269	nid00750
Clustername	daint	daint
Envfile

Explanation of Symbols

Symbol	MEANING
=	No performance change (confidence interval within ±1%)
(=)	Probably no performance change (confidence interval within ±2%)
(+)/(-)	Very small performance improvement/degradation (≤1%)
+/-	Small performance improvement/degradation (>10%)
++/--	Large performance improvement/degradation (>10%)
+++/---	Very large performance improvement/degradation (>10%)
?	Probably no change, but quite large uncertainty (confidence interval with ±5%)
??	Unclear result, very large uncertainty (±10%)
???	Something unexpected…

msimberg · 2023-05-04T13:19:50Z

This doesn't change performance in either direction in DLA-Future

However:

BENCHMARK NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch ---

This is worth some more investigation before we go ahead with this...

bors · 2023-05-04T13:24:03Z

try

Build failed:

jenkins/cscs-daint/clang-15-debug

msimberg · 2023-05-04T13:26:25Z

This is also new: https://cdash.cscs.ch/test/80173581.

msimberg · 2023-05-05T07:18:09Z

I think the performance regression actually comes from #670, not this PR. I'm going to revert that one instead for the moment.

msimberg · 2023-05-05T09:26:02Z

I rebased this on main after #670 was reverted and the performance test is back to normal. I would go ahead with this. However, there may still be some spinlocks that cover too big sections (there are some timeouts in the tests now).

msimberg · 2023-05-05T09:28:57Z

bors try

bors · 2023-05-05T09:44:50Z

try

Build failed:

jenkins/cscs-daint/clang-12-debug

msimberg · 2023-05-08T15:38:33Z

bors try

bors · 2023-05-08T15:56:43Z

try

Build failed:

jenkins/cscs-daint/clang-13-release

msimberg · 2023-05-09T09:15:19Z

I think this is now in a better shape. I still need to rerun benchmarks in DLA-Future but tests seem happier. In the last commit I disabled another place where yielding could happen (in set_thread_state, 737786b). We do have to keep an eye out for potential deadlocks after these changes. On the bright side I actually think it'll be easier to detect deadlocks (at least with the spinlock) because the threads will actually just be spinning instead of changing to another thread.

… non-yielding implementation

msimberg · 2023-05-09T13:47:42Z

Rebased after #679 was merged.

bors try

bors · 2023-05-09T14:00:54Z

try

Build failed:

jenkins/cscs-daint/clang-16-release

msimberg · 2023-05-10T07:29:35Z

bors merge

672: Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation r=msimberg a=msimberg Fixes #517, i.e. uses a non-yielding everywhere where a spinlock was used previously. The implementations were almost identical, but used a slightly different yielding strategy. The yielding one (`pika::spinlock`) used `yield_while` allowing yielding of the user-level thread. The non-yielding one (`pika::concurrency::detail::spinlock`) was sleeping the OS-thread after one iteration. This now uses the `pika::spinlock` strategy everywhere, except that yielding is disallowed when spinning. The name `pika::spinlock` is removed and only the `detail` one remains. Note that we used to have _three_ spinlock implementations. Now we still have two. The remaining one is a very basic one with near zero dependencies (no lock registration, no ITT support) that is still used in things like the configuration maps and resource partitioner. This doesn't change performance in either direction in DLA-Future, but it's a prerequisite for having a semi-blocking barrier (eth-cscs/DLA-Future#833). We will need to make corresponding changes in pika-algorithms since it uses `pika::spinlock` in a few places. Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>

bors · 2023-05-10T07:45:26Z

Build failed:

jenkins/cscs-daint/gcc-cuda-release

msimberg · 2023-05-10T07:47:17Z

bors merge

672: Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation r=msimberg a=msimberg Fixes #517, i.e. uses a non-yielding everywhere where a spinlock was used previously. The implementations were almost identical, but used a slightly different yielding strategy. The yielding one (`pika::spinlock`) used `yield_while` allowing yielding of the user-level thread. The non-yielding one (`pika::concurrency::detail::spinlock`) was sleeping the OS-thread after one iteration. This now uses the `pika::spinlock` strategy everywhere, except that yielding is disallowed when spinning. The name `pika::spinlock` is removed and only the `detail` one remains. Note that we used to have _three_ spinlock implementations. Now we still have two. The remaining one is a very basic one with near zero dependencies (no lock registration, no ITT support) that is still used in things like the configuration maps and resource partitioner. This doesn't change performance in either direction in DLA-Future, but it's a prerequisite for having a semi-blocking barrier (eth-cscs/DLA-Future#833). We will need to make corresponding changes in pika-algorithms since it uses `pika::spinlock` in a few places. Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>

bors · 2023-05-10T08:01:53Z

Build failed:

ci/circleci: arm64_build

689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`. (One of) the deadlock(s) was the following scenario: | thread 1 | thread 2 | |-|-| | wait_until | | | take lock | | | add self to cv queue | | | release lock | | | timed suspend | | | | notify_all | | | take lock | | timed resume | attempt to set thread 1 to pending | | attempt to take lock | fail because thread 1 is active | | spin trying to take lock | spin waiting for thread 1 to not be active | | deadlock | deadlock | This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something. I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`. I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>

msimberg requested a review from aurianer as a code owner May 4, 2023 13:08

msimberg self-assigned this May 4, 2023

msimberg requested a review from biddisco as a code owner May 4, 2023 13:08

msimberg added this to the 0.16.0 milestone May 4, 2023

bors bot added a commit that referenced this pull request May 4, 2023

Try #672:

6c2e316

msimberg marked this pull request as draft May 4, 2023 13:20

msimberg force-pushed the unify-spinlocks-never-yield branch from d5515cd to b74968e Compare May 5, 2023 08:48

msimberg marked this pull request as ready for review May 5, 2023 09:25

bors bot added a commit that referenced this pull request May 5, 2023

Try #672:

a05bcf9

bors bot added a commit that referenced this pull request May 8, 2023

Try #672:

956f435

msimberg added 2 commits May 9, 2023 15:47

Unify pika::spinlock and pika::concurrency::detail::spinlock into one…

afc9951

… non-yielding implementation

Never yield when setting thread state of other threads

0e21aa2

msimberg force-pushed the unify-spinlocks-never-yield branch from 737786b to 0e21aa2 Compare May 9, 2023 13:47

bors bot added a commit that referenced this pull request May 9, 2023

Try #672:

de9d873

msimberg merged commit d98281e into main May 10, 2023

bors bot deleted the unify-spinlocks-never-yield branch May 10, 2023 08:19

This was referenced May 11, 2023

Add timed busy-wait option to pika::barrier wait and arrive_and_wait #685

Merged

Fix deadlocks in condition_variable #689

Merged

msimberg mentioned this pull request May 30, 2023

TridiagSolver (local): "bulkerify" rank1 problem solution eth-cscs/DLA-Future#860

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation #672

Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation #672

msimberg commented May 4, 2023

msimberg commented May 4, 2023

pika-bot commented May 4, 2023

pika Performance

Comparison

Info

Explanation of Symbols

msimberg commented May 4, 2023

bors bot commented May 4, 2023

msimberg commented May 4, 2023

msimberg commented May 5, 2023

msimberg commented May 5, 2023 •

edited

Loading

msimberg commented May 5, 2023

bors bot commented May 5, 2023

msimberg commented May 8, 2023

bors bot commented May 8, 2023

msimberg commented May 9, 2023

msimberg commented May 9, 2023

bors bot commented May 9, 2023

msimberg commented May 10, 2023

bors bot commented May 10, 2023

msimberg commented May 10, 2023

bors bot commented May 10, 2023

Unify pika::spinlock and pika::concurrency::detail::spinlock into one implementation #672

Unify pika::spinlock and pika::concurrency::detail::spinlock into one implementation #672

Conversation

msimberg commented May 4, 2023

msimberg commented May 4, 2023

pika-bot commented May 4, 2023

pika Performance

Comparison

Info

Explanation of Symbols

msimberg commented May 4, 2023

bors bot commented May 4, 2023

try

msimberg commented May 4, 2023

msimberg commented May 5, 2023

msimberg commented May 5, 2023 • edited Loading

msimberg commented May 5, 2023

bors bot commented May 5, 2023

try

msimberg commented May 8, 2023

bors bot commented May 8, 2023

try

msimberg commented May 9, 2023

msimberg commented May 9, 2023

bors bot commented May 9, 2023

try

msimberg commented May 10, 2023

bors bot commented May 10, 2023

msimberg commented May 10, 2023

bors bot commented May 10, 2023

Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation #672

Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation #672

msimberg commented May 5, 2023 •

edited

Loading