Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix waiting in sync_wait #976

Merged
merged 1 commit into from
Jan 19, 2024
Merged

Fix waiting in sync_wait #976

merged 1 commit into from
Jan 19, 2024

Conversation

msimberg
Copy link
Contributor

This fixes a bug that has been there since the beginning of pika in sync_wait. More details are in the commit message, but briefly the notifying thread (one that has received one of the set_* channels) can end up releasing the lock after the sync_wait operation state (including the mutex that is unlocked) is released. This was an overeager optimization that I thought was a good idea at the time, in an attempt to avoid taking the lock sometimes.

An example of the data race (between the destructor of the the operation state and the unlocking of the mutex, a write) reported by thread sanitizer:

WARNING: ThreadSanitizer: data race (pid=2582947)
  Write of size 8 at 0x7b24000049b0 by main thread:
    #0 operator delete(void*, unsigned long) <null> (libtsan.so.2+0x880ce)
    #1 pika::detail::intrusive_ptr_release(pika::detail::condition_variable_data*) /home/mjs/src/pika/libs/pika/synchronization/src/detail/condition_variable.cpp:250 (libpika.so.0+0x1ce2a5)
    #2 pika::memory::intrusive_ptr<pika::detail::condition_variable_data>::~intrusive_ptr() /home/mjs/src/pika/libs/pika/memory/include/pika/memory/intrusive_ptr.hpp:82 (std_thread_scheduler_test+0x459230)
    #3 pika::concurrency::detail::cache_aligned_data_derived<pika::memory::intrusive_ptr<pika::detail::condition_variable_data>, std::integral_constant<bool, true> >::~cache_aligned_data_derived() /home/mjs/src/pika/libs/pika/concurrency/include/pika/concurrency/cache_line_data.hpp:108 (std_thread_scheduler_t
est+0x459230)
    #4 pika::condition_variable::~condition_variable() /home/mjs/src/pika/libs/pika/synchronization/include/pika/synchronization/condition_variable.hpp:64 (std_thread_scheduler_test+0x459230)
    #5 pika::sync_wait_detail::sync_wait_receiver_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::ensure_started_detail::ensure_started_sender_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::just_detail::just_sender_impl<pika::util::detail::pack_c<unsigned long, 0ul>, int>::j
ust_sender_type, pika::execution::experimental::std_thread_scheduler&>::schedule_from_sender_type, std::allocator<int> >::ensure_started_sender_type, pika::execution::experimental::std_thread_scheduler>::schedule_from_sender_type>::sync_wait_receiver_type::shared_state::~shared_state() /home/mjs/src/pika/libs
/pika/execution/include/pika/execution/algorithms/sync_wait.hpp:134 (std_thread_scheduler_test+0x459230)
    #6 auto pika::this_thread::experimental::tag_fallback_invoke<pika::schedule_from_detail::schedule_from_sender_impl<pika::ensure_started_detail::ensure_started_sender_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::just_detail::just_sender_impl<pika::util::detail::pack_c<unsigned long, 0ul
>, int>::just_sender_type, pika::execution::experimental::std_thread_scheduler&>::schedule_from_sender_type, std::allocator<int> >::ensure_started_sender_type, pika::execution::experimental::std_thread_scheduler>::schedule_from_sender_type, 42, 0>(pika::this_thread::experimental::sync_wait_t, pika::schedule_f
rom_detail::schedule_from_sender_impl<pika::ensure_started_detail::ensure_started_sender_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::just_detail::just_sender_impl<pika::util::detail::pack_c<unsigned long, 0ul>, int>::just_sender_type, pika::execution::experimental::std_thread_scheduler&>:
:schedule_from_sender_type, std::allocator<int> >::ensure_started_sender_type, pika::execution::experimental::std_thread_scheduler>::schedule_from_sender_type&&) /home/mjs/src/pika/libs/pika/execution/include/pika/execution/algorithms/sync_wait.hpp:234 (std_thread_scheduler_test+0x459230)
    #7 pika::functional::detail::tag_fallback_invoke_result<pika::functional::detail::tag_fallback_invoke_t_ns::tag_fallback_invoke_t const, pika::this_thread::experimental::sync_wait_t, pika::schedule_from_detail::schedule_from_sender_impl<pika::ensure_started_detail::ensure_started_sender_impl<pika::schedul
e_from_detail::schedule_from_sender_impl<pika::just_detail::just_sender_impl<pika::util::detail::pack_c<unsigned long, 0ul>, int>::just_sender_type, pika::execution::experimental::std_thread_scheduler&>::schedule_from_sender_type, std::allocator<int> >::ensure_started_sender_type, pika::execution::experimenta
l::std_thread_scheduler>::schedule_from_sender_type&&>::type pika::functional::detail::tag_base_ns::tag_fallback<pika::this_thread::experimental::sync_wait_t>::operator()<pika::schedule_from_detail::schedule_from_sender_impl<pika::ensure_started_detail::ensure_started_sender_impl<pika::schedule_from_detail::s
chedule_from_sender_impl<pika::just_detail::just_sender_impl<pika::util::detail::pack_c<unsigned long, 0ul>, int>::just_sender_type, pika::execution::experimental::std_thread_scheduler&>::schedule_from_sender_type, std::allocator<int> >::ensure_started_sender_type, pika::execution::experimental::std_thread_sc
heduler>::schedule_from_sender_type, void>(pika::schedule_from_detail::schedule_from_sender_impl<pika::ensure_started_detail::ensure_started_sender_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::just_detail::just_sender_impl<pika::util::detail::pack_c<unsigned long, 0ul>, int>::just_sender_t
ype, pika::execution::experimental::std_thread_scheduler&>::schedule_from_sender_type, std::allocator<int> >::ensure_started_sender_type, pika::execution::experimental::std_thread_scheduler>::schedule_from_sender_type&&) const /home/mjs/src/pika/libs/pika/tag_invoke/include/pika/functional/detail/tag_fallback
_invoke.hpp:265 (std_thread_scheduler_test+0x459230)
    #8 test_ensure_started() /home/mjs/src/pika/libs/pika/executors/tests/unit/std_thread_scheduler.cpp:654 (std_thread_scheduler_test+0x459230)
    #9 main /home/mjs/src/pika/libs/pika/executors/tests/unit/std_thread_scheduler.cpp:1280 (std_thread_scheduler_test+0x41be06)

  Previous atomic write of size 1 at 0x7b24000049b0 by thread T36:
    #0 std::__atomic_base<bool>::exchange(bool, std::memory_order) /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-13.2.0/include/c++/13.2.0/bits/atomic_base.h:523 (std_thread_scheduler_test+0x462ab6)
    #1 std::atomic<bool>::exchange(bool, std::memory_order) /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-13.2.0/include/c++/13.2.0/atomic:120 (std_thread_scheduler_test+0x462ab6)
    #2 pika::concurrency::detail::spinlock::acquire_lock() /home/mjs/src/pika/libs/pika/concurrency/include/pika/concurrency/spinlock.hpp:104 (std_thread_scheduler_test+0x462ab6)
    #3 pika::concurrency::detail::spinlock::lock() /home/mjs/src/pika/libs/pika/concurrency/include/pika/concurrency/spinlock.hpp:67 (std_thread_scheduler_test+0x462ab6)
    #4 std::unique_lock<pika::concurrency::detail::spinlock>::lock() /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-13.2.0/include/c++/13.2.0/bits/unique_lock.h:141 (std_thread_scheduler_test+0x462ab6)
    #5 std::unique_lock<pika::concurrency::detail::spinlock>::unique_lock(pika::concurrency::detail::spinlock&) /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-13.2.0/include/c++/13.2.0/bits/unique_lock.h:71 (std_thread_scheduler_test+0x462ab6)
    #6 pika::condition_variable::notify_one(pika::error_code&) /home/mjs/src/pika/libs/pika/synchronization/include/pika/synchronization/condition_variable.hpp:68 (std_thread_scheduler_test+0x462ab6)
    #7 pika::sync_wait_detail::sync_wait_receiver_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::ensure_started_detail::ensure_started_sender_impl<pika::schedule_from_detail::schedule_from_sender_impl<pika::just_detail::just_sender_impl<pika::util::detail::pack_c<unsigned long, 0ul>, int>::j
ust_sender_type, pika::execution::experimental::std_thread_scheduler&>::schedule_from_sender_type, std::allocator<int> >::ensure_started_sender_type, pika::execution::experimental::std_thread_scheduler>::schedule_from_sender_type>::sync_wait_receiver_type::signal_set_called() /home/mjs/src/pika/libs/pika/exec
ution/include/pika/execution/algorithms/sync_wait.hpp:177 (std_thread_scheduler_test+0x462fba)

Remove shortcut to check if lock can be avoided as it can lead to the shared state being freed
before the notifying thread has released the lock. The following can happen with the shortcut:

- thread 2 takes lock to notify condition variable
- thread 2 sets set_called to true
- thread 1 sees set_called == true
- thread 1 skips the wait and returns from sync_wait, releasing the shared state
- thread 2 releases the lock (with a write), which was stored in the shared state

By keeping always taking the lock we guarantee that the notifying thread will always release the
lock before the waiting thread returns from the wait.
@msimberg msimberg added this to the 0.22.0 milestone Jan 18, 2024
@msimberg msimberg self-assigned this Jan 18, 2024
@msimberg
Copy link
Contributor Author

cscs-ci run

@aurianer aurianer added this pull request to the merge queue Jan 19, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 19, 2024
@msimberg msimberg added this pull request to the merge queue Jan 19, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 19, 2024
@msimberg msimberg added this pull request to the merge queue Jan 19, 2024
Merged via the queue into main with commit 660b86d Jan 19, 2024
65 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants