Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement <stop_token> and jthread. #1196

Merged
merged 21 commits into from
Sep 22, 2020

Conversation

BillyONeal
Copy link
Member

@BillyONeal BillyONeal commented Aug 14, 2020

Implements P0660R10, P1869R1, resolves GH-32, and VSO-951574.

<atomic> / <memory>: Extract a new type _Locked_pointer which implements the low order bit tricks for atomic<shared_ptr> as a reusable component.

yvals_core.h / *_feature_test_macros/test.cpp: Add feature test macro __cpp_lib_jthread and note that the feature is implemented.

CMakeLists.txt: Add stop_token header. This will require copy_crt / setup changes in msvc.

<stop_token> : New header.

_Stop_callback_base forms a doubly-linked list of callbacks to invoke when stop is requested. It uses a function pointer rather than a virtual function for reduced RTTI codegen costs and to save an indirection of a vtbl access; the compiler being able to devirtualize this case is unlikely. (Using a function pointer rather than virtual was suggested by @lewissbaker )

_Stop_state is the shared state acting as the 'pipe' between stop sources and stop tokens. It has separate reference counts for sources and tokens, and all sources share 1 token count in the same way that all shared_ptrs share one weak count. The stop requested bit is merged into the sources counter to make atomically answering stop_possible possible without tricks. The counters are separate uint32_ts instead of one uint64_t to make operations manipulating them more efficient on 32 bit platforms, even though that means we might have to do a 1-2 more atomic ops on 64 bit platforms in some cases.

The tricky part here is primarily how we implement the dtor effects for stop_callback. We can't use a straightforward bool in the stop_callback indicating whether the callback has been called, because once we invoke the callback the memory on which the stop_callback is stored might be gone. To resolve this, we have the thread processing request_stop publish the callback it is currently retiring, so a wait_callback destructor, upon seeing that another thread is executing the callback, can wait until the currently being executed callback is something else.

<thread>:
Extract the thread startup machinery from thread's ctor; notably, this introduces another layer of forwards in the constructing thread, but doesn't introduce any additional stack frames in the thread procedure, preserving the debugger experience.

Implement jthread with a member thread and stop_source. Note that the standard currently makes the move assignment operator join on self move assign which is surprising; I asked L(E)WG if that is intentional.

<condition_variable>:
Implement the new interruptable waits. I wanted to copy this directly from the standard, but there's a race condition (see comment in the review) which is presumably intended to not happen but which the spec doesn't forbid right now. I asked SG1 folks to take a look at that part of the spec but don't have a defect number right now.

@BillyONeal BillyONeal added the cxx20 C++20 feature label Aug 14, 2020
@BillyONeal BillyONeal added this to the Conformance milestone Aug 14, 2020
@BillyONeal BillyONeal requested a review from a team as a code owner August 14, 2020 11:19
@BillyONeal BillyONeal changed the title Implement <stop_token> and jthread. Implement <stop_token> and jthread. Aug 14, 2020
stl/inc/atomic Outdated Show resolved Hide resolved
~stop_source() {
const auto _Local = _State;
if (_Local != nullptr) {
if ((_Local->_Stop_sources.fetch_sub(2, memory_order_relaxed) >> 1) == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In barrier ( #1057 ), I used named things:

inline constexpr ptrdiff_t _Barrier_arrival_token_mask = 1;
inline constexpr ptrdiff_t _Barrier_value_mask         = ~_Barrier_arrival_token_mask;
inline constexpr ptrdiff_t _Barrier_value_shift        = 1;
inline constexpr ptrdiff_t _Barrier_invalid_token      = 0;
inline constexpr ptrdiff_t _Barrier_value_step         = 1 << _Barrier_value_shift;
inline constexpr ptrdiff_t _Barrier_max                = (1ULL << (sizeof(ptrdiff_t) * CHAR_BIT - 2)) - 1;

I suggest that either you add named constants or I remove them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With named things it can go without shift, though anyway more verbose:

if ((_Local->_Stop_sources.fetch_sub(_Stop_source_count_step, memory_order_relaxed) == _Stop_source_count_step)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name all the things. Ask Kvothe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have great ideas for names here. Do you have suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fetch_sub(_Stop_sources_count_step, mo_relaxed)  >> _Stop_sources_count_shift) == 1

or

(fetch_sub(_Stop_sources_count_step`, mo_relaxed)  >> _Stop_sources_count_mask) == _Stop_sources_count_step

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually the arguer for long names and naming more things but I don't think they actually clarify what's going on here: the reader still needs to be intimately familiar with the bit-packing we're doing here. That is to say, a reader can't determine that the line is correct with the names but not determine that it's correct without the names.

Comment on lines +144 to +164
void operator()() const noexcept {
_This->notify_all();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to take out a lock on the internal mutex before calling notify_all() here to avoid a missed wake-up.

eg. consider in the wait() implementation below that the call to .request_stop() occurs after the check of _Stoken.stop_requested() and before the call to wait(_Lck).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notify_all already takes the internal mutex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to take the internal mutex before creating _Cb below...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would explain why this was added to cv_any and not plain cv.

Although I think there might be a spec defect here: If the registration for cancel is supposed to be part of the atomic operation that includes unlocking _Lck and going to sleep, the spec doesn't say that right now.

Copy link
Member Author

@BillyONeal BillyONeal Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is broken in the spec. I asked SG1 to comment:

{
    condition_variable_any cv;
    mutex m;
    jthread j{[&](stop_token token) {
        unique_lock lck{m};
        cv.wait(lck, token, [] { return false; });
    }};
} // destroy j

This program can deadlock under the current spec; there is nothing forbidding the following execution:

T1: launches T2
T2: takes m
T2: [thread.condvarany.intwait]/2 "Registers for the duration of this call *this to get notified on a stop request on stoken during this call", stop is not requested so execution continues
T2: while (!stoken.stop_requested()) {
T2: if (pred()) (pred returns false)
T1: j.~jthread
T1: request_stop(), calls cv.notify_all()
T2: wait(lck)
T1: join()
Deadlock.

Copy link

@lewissbaker lewissbaker Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This program can deadlock under the current spec

Can you elaborate on what causes the deadlock in this example?
I'm struggling to see where the deadlock occurs.
AFAICT, if request_stop(), and thus cv.notify_all(), has returned then T1 is no longer holding any resources that T2 is waiting on.

Maybe we need to take the internal mutex before creating _Cb below...

You don't want to do that. The callback might execute inline during construction of _Cb, which would then call notify_all(), which then also takes out a lock on the internal mutex. This would be UB to have the calling thread attempt to lock the mutex twice.

Copy link
Member Author

@BillyONeal BillyONeal Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on what causes the deadlock in this example?

After the execution I enumerated, T2 is waiting on the CV, and T1 is waiting on T2, and nothing will ever notify the CV. Note that when T1 tries to notify_all, T2 isn't waiting on the CV yet.

if request_stop(), and thus cv.notify_all(), has returned then T1 is no longer holding any resources that T2 is waiting on.

Correct, T1 is not holding any such resources. T2 isn't waiting on resources. But T2 is waiting to be notified.

You don't want to do that. The callback might execute inline during construction of _Cb, which would then call notify_all(), which then also takes out a lock on the internal mutex.

I see, we need something more complex :/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference implementation indeed does something a bit tricker to ensure it doesn't miss wakeup notifications.
See https://github.com/josuttis/jthread/blob/master/source/condition_variable_any2.hpp#L216-L238

It first locks the internal mutex and then checks stoken.stop_requested().
This prevents the missed wakeup if request_stop() is called between the evaluation of pred() and wait(lck).
The registered callback will block until wait() releases the internal mutex before calling .notify_all().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference implementation indeed does something a bit tricker to ensure it doesn't miss wakeup notifications.

I'm aggressively trying to not look at the reference implementation because licensing scary.

It first locks the internal mutex and then checks stoken.stop_requested().

Right, that's the exact fix. But nothing in the spec requires that right now.

stl/inc/stop_token Outdated Show resolved Hide resolved
stl/inc/stop_token Outdated Show resolved Hide resolved
stl/inc/stop_token Outdated Show resolved Hide resolved
tests/std/tests/VSO_0157762_feature_test_macros/test.cpp Outdated Show resolved Hide resolved
stl/inc/stop_token Outdated Show resolved Hide resolved
stl/inc/stop_token Show resolved Hide resolved
stl/inc/atomic Outdated Show resolved Hide resolved
stl/inc/atomic Outdated Show resolved Hide resolved
auto worker_id = worker.get_id();
chrono::steady_clock::time_point started_at;
{
// timing assumption that the main thread will try to destroy cb within request_wait_length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be an issue for the test harness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope not? I'm not sure how else you could test this.

tests/std/tests/P0660R10_jthread_and_cv_any/test.cpp Outdated Show resolved Hide resolved
tests/std/include/new_counter.hpp Outdated Show resolved Hide resolved
tests/std/include/new_counter.hpp Outdated Show resolved Hide resolved
tests/std/include/new_counter.hpp Show resolved Hide resolved
stl/inc/condition_variable Outdated Show resolved Hide resolved
stl/inc/yvals_core.h Outdated Show resolved Hide resolved
tests/std/include/new_counter.hpp Outdated Show resolved Hide resolved
tests/std/include/new_counter.hpp Show resolved Hide resolved
tests/std/include/new_counter.hpp Show resolved Hide resolved
stl/inc/thread Outdated Show resolved Hide resolved
stl/inc/thread Show resolved Hide resolved
stl/inc/thread Show resolved Hide resolved
BillyONeal added a commit to BillyONeal/STL that referenced this pull request Aug 18, 2020
…microsoft#1196 (comment)

This is broken in the spec. I asked SG1 to comment:

```
{
    condition_variable_any cv;
    mutex m;
    jthread j{[&](stop_token token) {
        unique_lock lck{m};
        cv.wait(lck, token, [] { return false; });
    }};
} // destroy j
```

This program can deadlock under the current spec; there is nothing forbidding the following execution:

T1: launches T2
T2: takes `m`
T2: [thread.condvarany.intwait]/2 "Registers for the duration of this call `*this` to get notified on a stop request on `stoken` during this call", stop is not requested so execution continues
T2: `while (!stoken.stop_requested()) {`
T2: `if (pred())` (`pred` returns `false`)
T1: `j.~jthread`
T1: `request_stop()`, calls `cv.notify_all()`
T2: `wait(lck)`
T1: `join()`
Deadlock.
@BillyONeal
Copy link
Member Author

BillyONeal commented Aug 24, 2020

(fetch_sub(_Stop_sources_count_step, mo_relaxed)  >> _Stop_sources_count_shift) == 1

or

(fetch_sub(_Stop_sources_count_step, mo_relaxed)  >> _Stop_sources_count_mask) == _Stop_sources_count_step

I don't think that makes this meaningfully clearer; if anything I think it makes it less clear, since the constants don't allow you to edit this without understanding the bit packing scheme in its entirety, and verifying the correctness of handling of the bitpacking scheme requires looking at the constants, and introducing named versions introduces an extra layer of 'indirection' for reviewers. (I'm posting this just after trying to grok the barrier PR :))

I don't think the constants are 'problematic' to the point where I'd ask to remove them in the other PR but I do think they are 'problemtic' enough that I don't want to add new ones here.

stl/inc/atomic Outdated Show resolved Hide resolved
stl/inc/atomic Outdated Show resolved Hide resolved
_Lck.unlock();
_Relock(_Lck);
_Unlock_guard<_Lock> _Unlock_outer{_Lck};
(void) _Unlock_outer;
Copy link
Member

@CaseyCarter CaseyCarter Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this discarded void cast? Did a compiler complain about _Unlock_outer being unused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/std/tests/P0660R10_jthread_and_cv_any/test.cpp Outdated Show resolved Hide resolved
Implements P0660R10, P1869R1, resolves microsoftGH-32, and VSO-951574.

<atomic> / <memory>: Extract a new type _Locked_pointer which implements the low order bit tricks for atomic<shared_ptr> as a reusable component.

yvals_core.h / *_feature_test_macros/test.cpp: Add feature test macro __cpp_lib_jthread and note that the feature is implemented.

CMakeLists.txt: Add stop_token header. This will require copy_crt / setup changes in msvc.

`<stop_token>` : New header. FIXME This needs a writeup after review.

_Stop_callback_base forms a doubly-linked list of callbacks to invoke when stop is requested. It uses a function pointer rather than a virtual function for reduced RTTI codegen costs and to save an indirection of a vtbl access; the compiler being able to devirtualize this case is unlikely. (Using a function pointer rather than virtual was suggested by @lewissbaker )

_Stop_state is the shared state acting as the 'pipe' between stop sources and stop tokens. It has separate reference counts for sources and tokens, and all sources share 1 token count in the same way that all shared_ptrs share one weak count. The stop requested bit is merged into the sources counter to make atomically answering stop_possible possible without tricks. The counters are separate uint32_ts instead of one uint64_t to make operations manipulating them more efficient on 32 bit platforms, even though that means we might have to do a 1-2 more atomic ops on 64 bit platforms in some cases.

The tricky part here is primarily how we implement the dtor effects for stop_callback. We can't use a straightforward bool in the stop_callback indicating whether the callback has been called, because once we invoke the callback the memory on which the stop_callback is stored might be gone. To resolve this, we have the thread processing request_stop publish the callback it is currently retiring, so a wait_callback destructor, upon seeing that another thread is executing the callback, can wait until the currently being executed callback is something else.

`<thread>`:
Extract the thread startup machinery from thread's ctor; notably, this introduces another layer of forwards in the constructing thread, but doesn't introduce any additional stack frames in the thread procedure, preserving the debugger experience.

Implement jthread with a member thread and stop_source. Note that the standard currently makes the move assignment operator join on self move assign which is surprising; I asked L(E)WG if that is intentional.

`<condition_variable>`:
Implement the new interruptable waits; this code is effectively copied from the standard. I filed a defect report because [thread.condvarany.intwait]/7 depicts a call to a nonexistent variable `cv`.
Copy link
Member

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve-with-suggestions.

stl/CMakeLists.txt Outdated Show resolved Hide resolved
stl/inc/stop_token Show resolved Hide resolved
stl/inc/stop_token Show resolved Hide resolved
stl/inc/yvals_core.h Outdated Show resolved Hide resolved
stl/inc/atomic Show resolved Hide resolved
stl/inc/thread Show resolved Hide resolved
tests/std/include/new_counter.hpp Outdated Show resolved Hide resolved
tests/std/include/new_counter.hpp Outdated Show resolved Hide resolved
tests/std/tests/P0660R10_jthread_and_cv_any/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/P0660R10_jthread_and_cv_any/test.cpp Outdated Show resolved Hide resolved
# Conflicts:
#	stl/inc/yvals_core.h
#	tests/std/tests/VSO_0157762_feature_test_macros/test.cpp
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • P0660R10_jthread_and_cv_any is failing to compile
  • stl/CMakeLists.txt still needs to be sorted

Will be ready (modulo MS-internal setup etc. changes) after fixing this. Thanks!

@StephanTLavavej StephanTLavavej self-assigned this Sep 22, 2020
@StephanTLavavej StephanTLavavej merged commit 9e264ab into microsoft:master Sep 22, 2020
@StephanTLavavej
Copy link
Member

Thanks for implementing these features! 🚀

@CaseyCarter
Copy link
Member

Congratulations on your final STL contribution! 😭💔

@BillyONeal
Copy link
Member Author

Congratulations on your final STL contribution! 😭💔

😭

Probably not final final though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cxx20 C++20 feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

P0660R10 <stop_token> And jthread
6 participants