-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlocks in condition_variable
#689
Fix deadlocks in condition_variable
#689
Conversation
Only hold lock while swapping the queue of waiting threads into a local queue. Retake lock if there's an error and the queue has to be spliced back into the member variable queue.
This change was only introduced to never yield while holding a lock in condition_variable::notify_all. However, that was relaxed to not hold a lock while resuming threads so we can again allow yielding in set_thread_state.
bors try |
tryBuild failed: |
73f49cc
to
17028a4
Compare
bors try |
bors try- |
17028a4
to
449c0ad
Compare
bors try |
tryBuild failed: |
449c0ad
to
a46a7ce
Compare
There were zero hangs in |
bors try |
bors merge |
689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`. (One of) the deadlock(s) was the following scenario: | thread 1 | thread 2 | |-|-| | wait_until | | | take lock | | | add self to cv queue | | | release lock | | | timed suspend | | | | notify_all | | | take lock | | timed resume | attempt to set thread 1 to pending | | attempt to take lock | fail because thread 1 is active | | spin trying to take lock | spin waiting for thread 1 to not be active | | deadlock | deadlock | This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something. I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`. I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
Build failed: |
bors merge |
689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`. (One of) the deadlock(s) was the following scenario: | thread 1 | thread 2 | |-|-| | wait_until | | | take lock | | | add self to cv queue | | | release lock | | | timed suspend | | | | notify_all | | | take lock | | timed resume | attempt to set thread 1 to pending | | attempt to take lock | fail because thread 1 is active | | spin trying to take lock | spin waiting for thread 1 to not be active | | deadlock | deadlock | This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something. I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`. I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
Build failed: |
This also happened: https://cdash.cscs.ch/test/83546786. More investigation needed after this is merged. bors merge |
689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`. (One of) the deadlock(s) was the following scenario: | thread 1 | thread 2 | |-|-| | wait_until | | | take lock | | | add self to cv queue | | | release lock | | | timed suspend | | | | notify_all | | | take lock | | timed resume | attempt to set thread 1 to pending | | attempt to take lock | fail because thread 1 is active | | spin trying to take lock | spin waiting for thread 1 to not be active | | deadlock | deadlock | This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something. I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`. I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
Build failed: |
bors merge |
689: Fix deadlocks in `condition_variable` r=msimberg a=msimberg This fixes deadlocks that started appearing after #672 which disabled yielding for `spinlock`. (One of) the deadlock(s) was the following scenario: | thread 1 | thread 2 | |-|-| | wait_until | | | take lock | | | add self to cv queue | | | release lock | | | timed suspend | | | | notify_all | | | take lock | | timed resume | attempt to set thread 1 to pending | | attempt to take lock | fail because thread 1 is active | | spin trying to take lock | spin waiting for thread 1 to not be active | | deadlock | deadlock | This PR changes `notify_all` to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread to `pending` and the thread actually being run by a worker thread, with the latter _not_ happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held in `notify_all`. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something. I've also reverted the change in `set_thread_state` to never yield from #672. Since the lock in `notify_all` is no longer held while resuming threads it's again safe to yield in `set_thread_state`. I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's `condition_variable`. Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
Build failed: |
This fixes deadlocks that started appearing after #672 which disabled yielding for
spinlock
.(One of) the deadlock(s) was the following scenario:
This PR changes
notify_all
to not hold the lock while resuming threads that need to be woken up. I see no reason to keep the lock for that time since there is anyway a delay between setting a thread topending
and the thread actually being run by a worker thread, with the latter not happening under a lock already right now. This PR just relaxes that constraint further. It also significantly reduces the time the lock is held innotify_all
. I'm quite sure this change is safe but we'll need to continue looking out for failures in CI in case I've missed something.I've also reverted the change in
set_thread_state
to never yield from #672. Since the lock innotify_all
is no longer held while resuming threads it's again safe to yield inset_thread_state
.I think spurious wakeups were probably possible before this change, but if they weren't they're now definitely possible with pika's
condition_variable
.