-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement <stop_token> and jthread. #1196
Conversation
<stop_token>
and jthread
.
stl/inc/stop_token
Outdated
~stop_source() { | ||
const auto _Local = _State; | ||
if (_Local != nullptr) { | ||
if ((_Local->_Stop_sources.fetch_sub(2, memory_order_relaxed) >> 1) == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In barrier
( #1057 ), I used named things:
inline constexpr ptrdiff_t _Barrier_arrival_token_mask = 1;
inline constexpr ptrdiff_t _Barrier_value_mask = ~_Barrier_arrival_token_mask;
inline constexpr ptrdiff_t _Barrier_value_shift = 1;
inline constexpr ptrdiff_t _Barrier_invalid_token = 0;
inline constexpr ptrdiff_t _Barrier_value_step = 1 << _Barrier_value_shift;
inline constexpr ptrdiff_t _Barrier_max = (1ULL << (sizeof(ptrdiff_t) * CHAR_BIT - 2)) - 1;
I suggest that either you add named constants or I remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With named things it can go without shift, though anyway more verbose:
if ((_Local->_Stop_sources.fetch_sub(_Stop_source_count_step, memory_order_relaxed) == _Stop_source_count_step)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name all the things. Ask Kvothe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have great ideas for names here. Do you have suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(fetch_sub(_Stop_sources_count_step, mo_relaxed) >> _Stop_sources_count_shift) == 1
or
(fetch_sub(_Stop_sources_count_step`, mo_relaxed) >> _Stop_sources_count_mask) == _Stop_sources_count_step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually the arguer for long names and naming more things but I don't think they actually clarify what's going on here: the reader still needs to be intimately familiar with the bit-packing we're doing here. That is to say, a reader can't determine that the line is correct with the names but not determine that it's correct without the names.
void operator()() const noexcept { | ||
_This->notify_all(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to take out a lock on the internal mutex before calling notify_all()
here to avoid a missed wake-up.
eg. consider in the wait()
implementation below that the call to .request_stop()
occurs after the check of _Stoken.stop_requested()
and before the call to wait(_Lck)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notify_all
already takes the internal mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to take the internal mutex before creating _Cb below...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would explain why this was added to cv_any and not plain cv.
Although I think there might be a spec defect here: If the registration for cancel is supposed to be part of the atomic operation that includes unlocking _Lck
and going to sleep, the spec doesn't say that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is broken in the spec. I asked SG1 to comment:
{
condition_variable_any cv;
mutex m;
jthread j{[&](stop_token token) {
unique_lock lck{m};
cv.wait(lck, token, [] { return false; });
}};
} // destroy j
This program can deadlock under the current spec; there is nothing forbidding the following execution:
T1: launches T2
T2: takes m
T2: [thread.condvarany.intwait]/2 "Registers for the duration of this call *this
to get notified on a stop request on stoken
during this call", stop is not requested so execution continues
T2: while (!stoken.stop_requested()) {
T2: if (pred())
(pred
returns false
)
T1: j.~jthread
T1: request_stop()
, calls cv.notify_all()
T2: wait(lck)
T1: join()
Deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This program can deadlock under the current spec
Can you elaborate on what causes the deadlock in this example?
I'm struggling to see where the deadlock occurs.
AFAICT, if request_stop()
, and thus cv.notify_all()
, has returned then T1 is no longer holding any resources that T2 is waiting on.
Maybe we need to take the internal mutex before creating _Cb below...
You don't want to do that. The callback might execute inline during construction of _Cb, which would then call notify_all(), which then also takes out a lock on the internal mutex. This would be UB to have the calling thread attempt to lock the mutex twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on what causes the deadlock in this example?
After the execution I enumerated, T2 is waiting on the CV, and T1 is waiting on T2, and nothing will ever notify the CV. Note that when T1 tries to notify_all
, T2 isn't waiting on the CV yet.
if request_stop(), and thus cv.notify_all(), has returned then T1 is no longer holding any resources that T2 is waiting on.
Correct, T1 is not holding any such resources. T2 isn't waiting on resources. But T2 is waiting to be notified.
You don't want to do that. The callback might execute inline during construction of _Cb, which would then call notify_all(), which then also takes out a lock on the internal mutex.
I see, we need something more complex :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference implementation indeed does something a bit tricker to ensure it doesn't miss wakeup notifications.
See https://github.com/josuttis/jthread/blob/master/source/condition_variable_any2.hpp#L216-L238
It first locks the internal mutex and then checks stoken.stop_requested()
.
This prevents the missed wakeup if request_stop()
is called between the evaluation of pred()
and wait(lck)
.
The registered callback will block until wait()
releases the internal mutex before calling .notify_all()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference implementation indeed does something a bit tricker to ensure it doesn't miss wakeup notifications.
I'm aggressively trying to not look at the reference implementation because licensing scary.
It first locks the internal mutex and then checks stoken.stop_requested().
Right, that's the exact fix. But nothing in the spec requires that right now.
auto worker_id = worker.get_id(); | ||
chrono::steady_clock::time_point started_at; | ||
{ | ||
// timing assumption that the main thread will try to destroy cb within request_wait_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be an issue for the test harness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope not? I'm not sure how else you could test this.
…microsoft#1196 (comment) This is broken in the spec. I asked SG1 to comment: ``` { condition_variable_any cv; mutex m; jthread j{[&](stop_token token) { unique_lock lck{m}; cv.wait(lck, token, [] { return false; }); }}; } // destroy j ``` This program can deadlock under the current spec; there is nothing forbidding the following execution: T1: launches T2 T2: takes `m` T2: [thread.condvarany.intwait]/2 "Registers for the duration of this call `*this` to get notified on a stop request on `stoken` during this call", stop is not requested so execution continues T2: `while (!stoken.stop_requested()) {` T2: `if (pred())` (`pred` returns `false`) T1: `j.~jthread` T1: `request_stop()`, calls `cv.notify_all()` T2: `wait(lck)` T1: `join()` Deadlock.
I don't think that makes this meaningfully clearer; if anything I think it makes it less clear, since the constants don't allow you to edit this without understanding the bit packing scheme in its entirety, and verifying the correctness of handling of the bitpacking scheme requires looking at the constants, and introducing named versions introduces an extra layer of 'indirection' for reviewers. (I'm posting this just after trying to grok the barrier PR :)) I don't think the constants are 'problematic' to the point where I'd ask to remove them in the other PR but I do think they are 'problemtic' enough that I don't want to add new ones here. |
_Lck.unlock(); | ||
_Relock(_Lck); | ||
_Unlock_guard<_Lock> _Unlock_outer{_Lck}; | ||
(void) _Unlock_outer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this discarded void
cast? Did a compiler complain about _Unlock_outer
being unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implements P0660R10, P1869R1, resolves microsoftGH-32, and VSO-951574. <atomic> / <memory>: Extract a new type _Locked_pointer which implements the low order bit tricks for atomic<shared_ptr> as a reusable component. yvals_core.h / *_feature_test_macros/test.cpp: Add feature test macro __cpp_lib_jthread and note that the feature is implemented. CMakeLists.txt: Add stop_token header. This will require copy_crt / setup changes in msvc. `<stop_token>` : New header. FIXME This needs a writeup after review. _Stop_callback_base forms a doubly-linked list of callbacks to invoke when stop is requested. It uses a function pointer rather than a virtual function for reduced RTTI codegen costs and to save an indirection of a vtbl access; the compiler being able to devirtualize this case is unlikely. (Using a function pointer rather than virtual was suggested by @lewissbaker ) _Stop_state is the shared state acting as the 'pipe' between stop sources and stop tokens. It has separate reference counts for sources and tokens, and all sources share 1 token count in the same way that all shared_ptrs share one weak count. The stop requested bit is merged into the sources counter to make atomically answering stop_possible possible without tricks. The counters are separate uint32_ts instead of one uint64_t to make operations manipulating them more efficient on 32 bit platforms, even though that means we might have to do a 1-2 more atomic ops on 64 bit platforms in some cases. The tricky part here is primarily how we implement the dtor effects for stop_callback. We can't use a straightforward bool in the stop_callback indicating whether the callback has been called, because once we invoke the callback the memory on which the stop_callback is stored might be gone. To resolve this, we have the thread processing request_stop publish the callback it is currently retiring, so a wait_callback destructor, upon seeing that another thread is executing the callback, can wait until the currently being executed callback is something else. `<thread>`: Extract the thread startup machinery from thread's ctor; notably, this introduces another layer of forwards in the constructing thread, but doesn't introduce any additional stack frames in the thread procedure, preserving the debugger experience. Implement jthread with a member thread and stop_source. Note that the standard currently makes the move assignment operator join on self move assign which is surprising; I asked L(E)WG if that is intentional. `<condition_variable>`: Implement the new interruptable waits; this code is effectively copied from the standard. I filed a defect report because [thread.condvarany.intwait]/7 depicts a call to a nonexistent variable `cv`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve-with-suggestions.
# Conflicts: # stl/inc/yvals_core.h # tests/std/tests/VSO_0157762_feature_test_macros/test.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0660R10_jthread_and_cv_any
is failing to compilestl/CMakeLists.txt
still needs to be sorted
Will be ready (modulo MS-internal setup etc. changes) after fixing this. Thanks!
Thanks for implementing these features! 🚀 |
Congratulations on your final STL contribution! 😭💔 |
😭 Probably not final final though |
Implements P0660R10, P1869R1, resolves GH-32, and VSO-951574.
<atomic>
/<memory>
: Extract a new type _Locked_pointer which implements the low order bit tricks foratomic<shared_ptr>
as a reusable component.yvals_core.h
/*_feature_test_macros/test.cpp
: Add feature test macro__cpp_lib_jthread
and note that the feature is implemented.CMakeLists.txt: Add stop_token header. This will require copy_crt / setup changes in msvc.
<stop_token>
: New header._Stop_callback_base
forms a doubly-linked list of callbacks to invoke when stop is requested. It uses a function pointer rather than a virtual function for reduced RTTI codegen costs and to save an indirection of a vtbl access; the compiler being able to devirtualize this case is unlikely. (Using a function pointer rather than virtual was suggested by @lewissbaker )_Stop_state
is the shared state acting as the 'pipe' between stop sources and stop tokens. It has separate reference counts for sources and tokens, and all sources share 1 token count in the same way that all shared_ptrs share one weak count. The stop requested bit is merged into the sources counter to make atomically answering stop_possible possible without tricks. The counters are separate uint32_ts instead of one uint64_t to make operations manipulating them more efficient on 32 bit platforms, even though that means we might have to do a 1-2 more atomic ops on 64 bit platforms in some cases.The tricky part here is primarily how we implement the dtor effects for stop_callback. We can't use a straightforward bool in the stop_callback indicating whether the callback has been called, because once we invoke the callback the memory on which the stop_callback is stored might be gone. To resolve this, we have the thread processing request_stop publish the callback it is currently retiring, so a wait_callback destructor, upon seeing that another thread is executing the callback, can wait until the currently being executed callback is something else.
<thread>
:Extract the thread startup machinery from thread's ctor; notably, this introduces another layer of forwards in the constructing thread, but doesn't introduce any additional stack frames in the thread procedure, preserving the debugger experience.
Implement jthread with a member thread and stop_source. Note that the standard currently makes the move assignment operator join on self move assign which is surprising; I asked L(E)WG if that is intentional.
<condition_variable>
:Implement the new interruptable waits. I wanted to copy this directly from the standard, but there's a race condition (see comment in the review) which is presumably intended to not happen but which the spec doesn't forbid right now. I asked SG1 folks to take a look at that part of the spec but don't have a defect number right now.