Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: refactor thread stopping mechanism #26757

Closed
wants to merge 2 commits into from
Closed

Conversation

addaleax
Copy link
Member

  • Follow style guide for naming, e.g. use lower_snake_case
    for simple setters/getters.
  • For performance, use atomics instead of a mutex, and inline
    the corresponding getter/setter pair.
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

- Follow style guide for naming, e.g. use lower_snake_case
  for simple setters/getters.
- For performance, use atomics instead of a mutex, and inline
  the corresponding getter/setter pair.
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Mar 18, 2019
src/env-inl.h Outdated
}

void AsyncRequest::set_stopped(bool flag) {
stopped_.store(flag, std::memory_order_relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so: the heavy usage of muexes around the worker code where multi-thread data access was expected, was almost always to ensure data consistency by flushing cache lines? (in other words, writes in one thread is made visible to other threads instantly). If so, std::memory_order_relaxed constraint is insufficient to ensure that? we might need at least memory_order_acquire ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I retract above question - when I tested with a small code, I see mfence or sync instruction being added with memory_order_relaxed itself; so pls ignore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So … my line of thinking was that the syscalls behind uv_async_send() would themselves present full memory barriers. I’ll check later and verify that that’s indeed correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax - that may be true; but we do have set_stopped is_stopped calls that the threads can call but do not involve syscalls? however:

int foo(bool flag) {
  stopped_.store(flag, std::memory_order_relaxed);
}

this is what I tested and this is what I see in the generated code:

(gdb) x/30i 0x40053e
   0x40053e <_ZNSt11atomic_bool5storeEbSt12memory_order>:	push   rbp
   0x40053f <_ZNSt11atomic_bool5storeEbSt12memory_order+1>:	mov    rbp,rsp
   0x400542 <_ZNSt11atomic_bool5storeEbSt12memory_order+4>:	sub    rsp,0x30
   0x400546 <_ZNSt11atomic_bool5storeEbSt12memory_order+8>:	mov    QWORD PTR [rbp-0x28],rdi
   0x40054a <_ZNSt11atomic_bool5storeEbSt12memory_order+12>:	mov    eax,esi
   0x40054c <_ZNSt11atomic_bool5storeEbSt12memory_order+14>:	mov    DWORD PTR [rbp-0x30],edx
   0x40054f <_ZNSt11atomic_bool5storeEbSt12memory_order+17>:	mov    BYTE PTR [rbp-0x2c],al
   0x400552 <_ZNSt11atomic_bool5storeEbSt12memory_order+20>:	movzx  eax,BYTE PTR [rbp-0x2c]
   0x400556 <_ZNSt11atomic_bool5storeEbSt12memory_order+24>:	mov    rdx,QWORD PTR [rbp-0x28]
   0x40055a <_ZNSt11atomic_bool5storeEbSt12memory_order+28>:	mov    QWORD PTR [rbp-0x8],rdx
   0x40055e <_ZNSt11atomic_bool5storeEbSt12memory_order+32>:	mov    BYTE PTR [rbp-0x9],al
   0x400561 <_ZNSt11atomic_bool5storeEbSt12memory_order+35>:	and    BYTE PTR [rbp-0x9],0x1
   0x400565 <_ZNSt11atomic_bool5storeEbSt12memory_order+39>:	mov    eax,DWORD PTR [rbp-0x30]
   0x400568 <_ZNSt11atomic_bool5storeEbSt12memory_order+42>:	mov    DWORD PTR [rbp-0x10],eax
   0x40056b <_ZNSt11atomic_bool5storeEbSt12memory_order+45>:	mov    eax,DWORD PTR [rbp-0x10]
   0x40056e <_ZNSt11atomic_bool5storeEbSt12memory_order+48>:	mov    esi,0xffff
   0x400573 <_ZNSt11atomic_bool5storeEbSt12memory_order+53>:	mov    edi,eax
   0x400575 <_ZNSt11atomic_bool5storeEbSt12memory_order+55>:	
    call   0x40052a <_ZStanSt12memory_orderSt23__memory_order_modifier>
   0x40057a <_ZNSt11atomic_bool5storeEbSt12memory_order+60>:	mov    DWORD PTR [rbp-0x14],eax
   0x40057d <_ZNSt11atomic_bool5storeEbSt12memory_order+63>:	movzx  edx,BYTE PTR [rbp-0x9]
   0x400581 <_ZNSt11atomic_bool5storeEbSt12memory_order+67>:	mov    rax,QWORD PTR [rbp-0x8]
   0x400585 <_ZNSt11atomic_bool5storeEbSt12memory_order+71>:	mov    BYTE PTR [rax],dl
   0x400587 <_ZNSt11atomic_bool5storeEbSt12memory_order+73>:	mfence 
   0x40058a <_ZNSt11atomic_bool5storeEbSt12memory_order+76>:	leave  
   0x40058b <_ZNSt11atomic_bool5storeEbSt12memory_order+77>:	ret    

please note that mfence at 0x400587 that settles the matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please note that mfence at 0x400587 that settles the matter?

On x64 it does, yes – to be honest, I don’t know how the different memory order modes are implemented on different platforms? It seems like this disassembled implementation simply ignores the order argument?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   0x400529 <_Z3foob+35>:	ret    
   0x40052a <_ZStanSt12memory_orderSt23__memory_order_modifier>:	push   rbp

@addaleax - if you look at the continuity of the instructions, looks like these (the atomic* helpers) are not static APIs, but compiler-generated code, on the fly; so it is possible that only necessary code was generated, on a per compilation unit basis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my line of thinking was that the syscalls behind uv_async_send() would themselves present full memory barriers.

Libuv doesn't promise that. uv_async_send() can (at least in theory) elide the system call.

I'm kind of surprised the compiler emits an mfence. It's not needed on x64 (nor any other architecture, I think?) because aligned loads and stores are always atomic. It might just be a compiler bug; I wouldn't depend on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gireeshpunathil I wouldn’t think so, the linker should be able to elide multiple copies of that variable into a single one.

@bnoordhuis Yeah, thanks. I’ve removed the memory_order_relaxed bit.

@@ -381,7 +381,8 @@ void Worker::OnThreadStopped() {
Worker::~Worker() {
Mutex::ScopedLock lock(mutex_);

CHECK(stopped_ || env_ == nullptr || env_->GetAsyncRequest()->IsStopped());
CHECK(stopped_);
CHECK_NULL(env_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, there was a control flow that takes to Worker destructor without nullifying env_ , not able to figure that out now; do you know?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gireeshpunathil I think that would be a bug – the child thread is not allowed to exist at this point (and the next CHECK verifies that the thread has been joined), and the child thread in turn owns the Environment.

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, does the overhead of locking/unlocking the mutex show up in profiles anywhere?

(I mean, I could imagine it does but I can also imagine it doesn't. I'd like to be convinced by numbers. :-))

src/env-inl.h Outdated
}

void AsyncRequest::set_stopped(bool flag) {
stopped_.store(flag, std::memory_order_relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my line of thinking was that the syscalls behind uv_async_send() would themselves present full memory barriers.

Libuv doesn't promise that. uv_async_send() can (at least in theory) elide the system call.

I'm kind of surprised the compiler emits an mfence. It's not needed on x64 (nor any other architecture, I think?) because aligned loads and stores are always atomic. It might just be a compiler bug; I wouldn't depend on it.

@addaleax
Copy link
Member Author

addaleax commented Mar 19, 2019

I'm curious, does the overhead of locking/unlocking the mutex show up in profiles anywhere?

This PR was created because it does :) It doesn’t have a huge impact, but this code is run a lot, and it might make up 1 % or so of the runtime for some processes.

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@addaleax
Copy link
Member Author

@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 21, 2019
@addaleax
Copy link
Member Author

@BridgeAR
Copy link
Member

Landed in d812dbb 🎉

@BridgeAR BridgeAR closed this Mar 21, 2019
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Mar 21, 2019
- Follow style guide for naming, e.g. use lower_snake_case
  for simple setters/getters.
- For performance, use atomics instead of a mutex, and inline
  the corresponding getter/setter pair.

PR-URL: nodejs#26757
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
targos pushed a commit to targos/node that referenced this pull request Mar 27, 2019
- Follow style guide for naming, e.g. use lower_snake_case
  for simple setters/getters.
- For performance, use atomics instead of a mutex, and inline
  the corresponding getter/setter pair.

PR-URL: nodejs#26757
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
targos pushed a commit that referenced this pull request Mar 27, 2019
- Follow style guide for naming, e.g. use lower_snake_case
  for simple setters/getters.
- For performance, use atomics instead of a mutex, and inline
  the corresponding getter/setter pair.

PR-URL: #26757
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants