Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify pika::spinlock and pika::concurrency::detail::spinlock into one implementation #672

Merged
merged 2 commits into from
May 10, 2023

Conversation

msimberg
Copy link
Contributor

@msimberg msimberg commented May 4, 2023

Fixes #517, i.e. uses a non-yielding everywhere where a spinlock was used previously. The implementations were almost identical, but used a slightly different yielding strategy. The yielding one (pika::spinlock) used yield_while allowing yielding of the user-level thread. The non-yielding one (pika::concurrency::detail::spinlock) was sleeping the OS-thread after one iteration. This now uses the pika::spinlock strategy everywhere, except that yielding is disallowed when spinning. The name pika::spinlock is removed and only the detail one remains.

Note that we used to have three spinlock implementations. Now we still have two. The remaining one is a very basic one with near zero dependencies (no lock registration, no ITT support) that is still used in things like the configuration maps and resource partitioner.

This doesn't change performance in either direction in DLA-Future, but it's a prerequisite for having a semi-blocking barrier (eth-cscs/DLA-Future#833).

We will need to make corresponding changes in pika-algorithms since it uses pika::spinlock in a few places.

@msimberg msimberg requested a review from aurianer as a code owner May 4, 2023 13:08
@msimberg msimberg self-assigned this May 4, 2023
@msimberg msimberg requested a review from biddisco as a code owner May 4, 2023 13:08
@msimberg msimberg added this to the 0.16.0 milestone May 4, 2023
@msimberg
Copy link
Contributor Author

msimberg commented May 4, 2023

bors try

bors bot added a commit that referenced this pull request May 4, 2023
@pika-bot
Copy link
Collaborator

pika-bot commented May 4, 2023

Performance test report

pika Performance

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch---

Info

PropertyBeforeAfter
pika Commit97100fb4202bc6
pika Datetime2023-04-24T15:38:14+00:002023-05-04T13:08:21+00:00
Datetime2023-04-24T17:46:47.209658+02:002023-05-04T15:17:53.243905+02:00
Compiler/apps/daint/SSL/pika/spack/lib/spack/env/clang/clang++ 11.0.1/apps/daint/SSL/pika/spack/lib/spack/env/clang/clang++ 11.0.1
Hostnamenid00269nid00750
Clusternamedaintdaint
Envfile

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (>10%)
++/--Large performance improvement/degradation (>10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@msimberg
Copy link
Contributor Author

msimberg commented May 4, 2023

This doesn't change performance in either direction in DLA-Future

However:

BENCHMARK NO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch ---

This is worth some more investigation before we go ahead with this...

@msimberg msimberg marked this pull request as draft May 4, 2023 13:20
@bors
Copy link
Contributor

bors bot commented May 4, 2023

try

Build failed:

@msimberg
Copy link
Contributor Author

msimberg commented May 4, 2023

This is also new: https://cdash.cscs.ch/test/80173581.

@msimberg
Copy link
Contributor Author

msimberg commented May 5, 2023

I think the performance regression actually comes from #670, not this PR. I'm going to revert that one instead for the moment.

@msimberg msimberg force-pushed the unify-spinlocks-never-yield branch from d5515cd to b74968e Compare May 5, 2023 08:48
@msimberg msimberg marked this pull request as ready for review May 5, 2023 09:25
@msimberg
Copy link
Contributor Author

msimberg commented May 5, 2023

I rebased this on main after #670 was reverted and the performance test is back to normal. I would go ahead with this. However, there may still be some spinlocks that cover too big sections (there are some timeouts in the tests now).

@msimberg
Copy link
Contributor Author

msimberg commented May 5, 2023

bors try

bors bot added a commit that referenced this pull request May 5, 2023
@bors
Copy link
Contributor

bors bot commented May 5, 2023

try

Build failed:

@msimberg
Copy link
Contributor Author

msimberg commented May 8, 2023

bors try

bors bot added a commit that referenced this pull request May 8, 2023
@bors
Copy link
Contributor

bors bot commented May 8, 2023

try

Build failed:

@msimberg
Copy link
Contributor Author

msimberg commented May 9, 2023

I think this is now in a better shape. I still need to rerun benchmarks in DLA-Future but tests seem happier. In the last commit I disabled another place where yielding could happen (in set_thread_state, 737786b). We do have to keep an eye out for potential deadlocks after these changes. On the bright side I actually think it'll be easier to detect deadlocks (at least with the spinlock) because the threads will actually just be spinning instead of changing to another thread.

@msimberg msimberg force-pushed the unify-spinlocks-never-yield branch from 737786b to 0e21aa2 Compare May 9, 2023 13:47
@msimberg
Copy link
Contributor Author

msimberg commented May 9, 2023

Rebased after #679 was merged.

bors try

bors bot added a commit that referenced this pull request May 9, 2023
@bors
Copy link
Contributor

bors bot commented May 9, 2023

try

Build failed:

@msimberg
Copy link
Contributor Author

bors merge

bors bot added a commit that referenced this pull request May 10, 2023
672: Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation r=msimberg a=msimberg

Fixes #517, i.e. uses a non-yielding everywhere where a spinlock was used previously. The implementations were almost identical, but used a slightly different yielding strategy. The yielding one (`pika::spinlock`) used `yield_while` allowing yielding of the user-level thread. The non-yielding one (`pika::concurrency::detail::spinlock`) was sleeping the OS-thread after one iteration. This now uses the `pika::spinlock` strategy everywhere, except that yielding is disallowed when spinning. The name `pika::spinlock` is removed and only the `detail` one remains.

Note that we used to have _three_ spinlock implementations. Now we still have two. The remaining one is a very basic one with near zero dependencies (no lock registration, no ITT support) that is still used in things like the configuration maps and resource partitioner.

This doesn't change performance in either direction in DLA-Future, but it's a prerequisite for having a semi-blocking barrier (eth-cscs/DLA-Future#833).

We will need to make corresponding changes in pika-algorithms since it uses `pika::spinlock` in a few places.

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
@bors
Copy link
Contributor

bors bot commented May 10, 2023

Build failed:

@msimberg
Copy link
Contributor Author

bors merge

bors bot added a commit that referenced this pull request May 10, 2023
672: Unify `pika::spinlock` and `pika::concurrency::detail::spinlock` into one implementation r=msimberg a=msimberg

Fixes #517, i.e. uses a non-yielding everywhere where a spinlock was used previously. The implementations were almost identical, but used a slightly different yielding strategy. The yielding one (`pika::spinlock`) used `yield_while` allowing yielding of the user-level thread. The non-yielding one (`pika::concurrency::detail::spinlock`) was sleeping the OS-thread after one iteration. This now uses the `pika::spinlock` strategy everywhere, except that yielding is disallowed when spinning. The name `pika::spinlock` is removed and only the `detail` one remains.

Note that we used to have _three_ spinlock implementations. Now we still have two. The remaining one is a very basic one with near zero dependencies (no lock registration, no ITT support) that is still used in things like the configuration maps and resource partitioner.

This doesn't change performance in either direction in DLA-Future, but it's a prerequisite for having a semi-blocking barrier (eth-cscs/DLA-Future#833).

We will need to make corresponding changes in pika-algorithms since it uses `pika::spinlock` in a few places.

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
@bors
Copy link
Contributor

bors bot commented May 10, 2023

Build failed:

@msimberg msimberg merged commit d98281e into main May 10, 2023
@bors bors bot deleted the unify-spinlocks-never-yield branch May 10, 2023 08:19
bors bot added a commit that referenced this pull request May 17, 2023
689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg

This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`.

(One of) the deadlock(s) was the following scenario:

| thread 1 | thread 2 |
|-|-|
| wait_until | |
| take lock | |
| add self to cv queue | | 
| release lock | |
| timed suspend | |
| | notify_all |
| | take lock |
| timed resume | attempt to set thread 1 to pending |
| attempt to take lock | fail because thread 1 is active |
| spin trying to take lock | spin waiting for thread 1 to not be active |
| deadlock | deadlock |

This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something.

I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`.

I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. 

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
bors bot added a commit that referenced this pull request May 17, 2023
689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg

This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`.

(One of) the deadlock(s) was the following scenario:

| thread 1 | thread 2 |
|-|-|
| wait_until | |
| take lock | |
| add self to cv queue | | 
| release lock | |
| timed suspend | |
| | notify_all |
| | take lock |
| timed resume | attempt to set thread 1 to pending |
| attempt to take lock | fail because thread 1 is active |
| spin trying to take lock | spin waiting for thread 1 to not be active |
| deadlock | deadlock |

This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something.

I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`.

I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. 

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
bors bot added a commit that referenced this pull request May 17, 2023
689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg

This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`.

(One of) the deadlock(s) was the following scenario:

| thread 1 | thread 2 |
|-|-|
| wait_until | |
| take lock | |
| add self to cv queue | | 
| release lock | |
| timed suspend | |
| | notify_all |
| | take lock |
| timed resume | attempt to set thread 1 to pending |
| attempt to take lock | fail because thread 1 is active |
| spin trying to take lock | spin waiting for thread 1 to not be active |
| deadlock | deadlock |

This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something.

I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`.

I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. 

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
bors bot added a commit that referenced this pull request May 17, 2023
689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg

This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`.

(One of) the deadlock(s) was the following scenario:

| thread 1 | thread 2 |
|-|-|
| wait_until | |
| take lock | |
| add self to cv queue | | 
| release lock | |
| timed suspend | |
| | notify_all |
| | take lock |
| timed resume | attempt to set thread 1 to pending |
| attempt to take lock | fail because thread 1 is active |
| spin trying to take lock | spin waiting for thread 1 to not be active |
| deadlock | deadlock |

This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something.

I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`.

I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. 

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Attempt to use non-yielding spinlocks in short critical sections
2 participants