Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge number of unix-systemcalls under macOS #2474

Closed
gin66 opened this issue Jan 8, 2018 · 16 comments
Closed

Huge number of unix-systemcalls under macOS #2474

gin66 opened this issue Jan 8, 2018 · 16 comments

Comments

@gin66
Copy link

gin66 commented Jan 8, 2018

Currently I am developing a socks-proxy in pony, which should do kind of load balancing via several ssh-socks-channels to three servers. Works well and has been fun to write with actors.

Now I am only concerned,that this little workload is shown with 10% in macOS' system activity monitor, while the actual ssh-proxies (doing the whole work including encryption) are around 1% or less. On a macbook this makes a difference for the usage time, so this load is not acceptable at all.

In order to avoid my code being the culprit, I have made comparison between echo-server from the examples directory and its Python counterpart.

Test method is to call:

echo a|nc 127.0.0.1 <actual listen port>

and for ten requests:

for i in 1 2 3 4 5 6 7 8 9 10;do echo a|nc 127.0.0.1 <actual listen port>;done

Result is:

  • pony: ~4400 syscalls for first invocation and ~4800 sys calls for second invocation
  • python: ~21 sys calls for first invocation and ~210 for second invocation.

The python numbers are as they should be. Even the actor system of pony should not lead to thousands of sys calls. Apparently the pony runtime has efficiency issues, which need to be solved in order to be really a high-performant language.

> ponyc --version
0.21.2 [release]
compiled with: llvm 3.9.1 -- Apple LLVM version 9.0.0 (clang-900.0.39.2)

Little side note: PONY is in PYthON :-)

@SeanTAllen
Copy link
Member

These are wildly different programs that happen to perform the same end task, echo server.

However, correct me if I'm wrong but your python version is single threaded and probably uses synchronous IO. Is that correct?

The Pony version would be using multiple native threads to potentially multiplex many different actors concurrently. Additionally, it operates using async io.

A program designed in the fashion that the Pony one is would have far more system call overhead than the synchronous single threaded version.

All other things being equal, the program designed in the fashion of the Pony one should also be able to handle far more concurrent requests.

If you'd like to strike up a conversation about specifics, the mailing list, and IRC are available.
I'd suggest coming up with a breakdown of the system calls in question, and someone could walk you through why those system calls are made.

For example, Pony programs depending on the workload can end up making a large number of system calls to sleep. Why? Scheduler threads put themselves to sleep for a while if they can't find any work to do. The longer they sleep, the fewer system calls. Also, the longer they sleep, the longer it would take to respond to an increase in workload. There's a tradeoff there that should be discussed based on the merits of each.

If you have specific efficiency concerns and data to discuss, those can be fruitful conversations. I'm closing this issue because there's nothing actionable here, but to reiterate, I encourage you to dig deeper into what you are seeing and avail yourself of IRC and the mailing list to learn more about the Pony runtime and become an active member of discussions about the tradeoffs we have to make.

@gin66
Copy link
Author

gin66 commented Jan 10, 2018

@SeanTAllen As you have proposed, I have collected some data in regard to my concerns. I have performed an strace under linux, because under macOS I do not get dtrace to work (even with sudo). The result is pretty clear. For the actual communication, I count 24 syscalls for Pony, which is same to the 21 of Python. So this confirms my expectation, that Pony and Python should be on par for the workload. Nice to see, the use of two threads by the different pids:

[pid 12488] nanosleep({tv_sec=0, tv_nsec=100000},  <unfinished ...>
[pid 12487] epoll_wait(3,  <unfinished ...>
[pid 12488] <... nanosleep resumed> NULL) = 0
[pid 12488] accept4(5, NULL, NULL, SOCK_NONBLOCK) = 6
[pid 12488] accept4(5, NULL, NULL, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12488] epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLONESHOT|EPOLLET, {u32=3437656896, u64=140036551375680}}) = 0
[pid 12487] <... epoll_wait resumed> [{EPOLLIN|EPOLLOUT, {u32=3437656896, u64=140036551375680}}], 64, -1) = 1
[pid 12487] epoll_wait(3,  <unfinished ...>
[pid 12488] recvfrom(6, "a\n", 64, 0, NULL, NULL) = 2
[pid 12488] recvfrom(6, 0x7f5ccce67040, 64, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12488] epoll_ctl(3, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLRDHUP|EPOLLONESHOT|EPOLLET, {u32=3437656896, u64=140036551375680}}) = 0
[pid 12488] write(1, "connection accepted\n", 20connection accepted
) = 20
[pid 12488] write(1, "data received, looping it back\n", 31data received, looping it back
) = 31
[pid 12488] recvfrom(6, 0x7f5ccce67040, 64, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12488] epoll_ctl(3, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLRDHUP|EPOLLONESHOT|EPOLLET, {u32=3437656896, u64=140036551375680}}) = 0
[pid 12488] writev(6, [{iov_base="server says: ", iov_len=13}], 1) = 13
[pid 12488] writev(6, [{iov_base="a\n", iov_len=2}], 1) = 2
[pid 12488] nanosleep({tv_sec=0, tv_nsec=100000}, NULL) = 0
[pid 12488] nanosleep({tv_sec=0, tv_nsec=100000}, NULL) = 0
[pid 12488] nanosleep({tv_sec=0, tv_nsec=100000}, NULL) = 0
[pid 12488] nanosleep({tv_sec=0, tv_nsec=100000}, NULL) = 0
[pid 12488] nanosleep({tv_sec=0, tv_nsec=100000}, NULL) = 0
[pid 12488] nanosleep({tv_sec=0, tv_nsec=100000}, NULL) = 0
[pid 12488] nanosleep({tv_sec=0, tv_nsec=100000},  <unfinished ...>
[pid 12487] <... epoll_wait resumed> [{EPOLLIN|EPOLLRDHUP, {u32=3437624896, u64=140036551343680}}], 64, -1) = 1
[pid 12488] <... nanosleep resumed> NULL) = 0
[pid 12487] epoll_wait(3,  <unfinished ...>
[pid 12488] recvfrom(6, "", 64, 0, NULL, NULL) = 0
[pid 12488] shutdown(6, SHUT_WR)        = 0
[pid 12488] epoll_ctl(3, EPOLL_CTL_DEL, 6, NULL) = 0
[pid 12488] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 12487] <... epoll_wait resumed> [{EPOLLIN, {u32=3731315712, u64=140036845034496}}], 64, -1) = 1
[pid 12488] close(6 <unfinished ...>
[pid 12487] epoll_wait(3,  <unfinished ...>
[pid 12488] <... close resumed> )       = 0
[pid 12488] write(1, "server closed\n", 14server closed
) = 14

In the strace dump is too included the explanation for the other ~4000 syscalls as per my first post. Those are all nanosleep waits. This you have already indicated by the remark: Pony programs depending on the workload can end up making a large number of system calls to sleep. Just for reference, the code which increases the sleep time in steps from 100us to 10ms. This is the selection part with Windows stuff removed for clarity:

  if(yield)
  {
    // A billion cycles is roughly half a second, depending on clock speed.
    if((tsc2 - tsc) > 10000000000)
    {
      // If it has been 10 billion cycles, pause 30 ms.
      ts.tv_nsec = 30000000;
    } else if((tsc2 - tsc) > 3000000000) {
      // If it has been 3 billion cycles, pause 10 ms.
      ts.tv_nsec = 10000000;
    } else if((tsc2 - tsc) > 1000000000) {
      // If it has been 1 billion cycles, pause 1 ms.
      ts.tv_nsec = 1000000;
    }
    else
    {
      // Otherwise, pause for 100 microseconds
      ts.tv_nsec = 100000;
    }
  }

Now my question is quite simple: if those busy sleep loops can be avoided by e.g. use of pthread_mutex/pthread_cond blocking calls or a semaphore ? With a quick glance on the code, I assume, few changes in scheduler could be sufficient plus a command line option to enable pthread usage.

If wakeup from a block condition is immediate, a pony program can actually even be faster with this solution. But I have not found in the internet clear information, how fast a thread restarts after an unlock (if it is immediately or a os defined cycle later).

Reason I am concerned is, that I do own a Macbook, which often runs on battery. My intention is to run my socks-proxy constantly in the background. But if minor internet traffic lets the CPU jump to >10% CPU for a while, then I loose hours of battery time. And I fear, that macOS implements a 100us sleep with a running CPU at full speed which is not good for the battery at all.

@SeanTAllen
Copy link
Member

I'm not sure I follow. If I understand, you are suggesting a semaphore/condition that would allow a scheduler to know that "work is available"?

dipinhora added a commit to dipinhora/ponyc that referenced this issue Jan 10, 2018
Change dynamic scheduler scaling implementation in order to resolve
the hangs encountered in ponylang#2451.

The previous implementation assumed that signalling to wake a thread
was a reliable operation. Apparently, that's not necessarily true
(see https://en.wikipedia.org/wiki/Spurious_wakeup and
https://askldjd.com/2010/04/24/the-lost-wakeup-problem/). Seeing
as we couldn't find any other explanation for why the previous
implementation was experiencing hangs, I've assumed it is either
because of lost wake ups or spurious wake ups and redesigned the
logic accordingly.

Now, when a thread is about to suspend, it will decrement the
`active_scheduler_count` and then suspend. When it wakes up, it will
check to see if the `active_scheduler_count` is at least as big as
its `index`. If the `active_scheduler_count` isn't big enough, the
thread will suspend itself again immediately. If it is big enough,
it will resume. Threads no longer modify `active_scheduler_count`
when they wake up.

`active_scheduler_count` must now be modified by the thread that is
waking up another thread prior to sending the wake up notification.
Additionally, since we're now assuming that wake up signals can be
lost, we now send multiple wake up notifications just in case. While
this is somewhat wasteful, it is better than being in a situation
where some threads aren't woken up at all (i.e. a hang).

This commit also includes a change inspired by ponylang#2474. Now, *all*
scheduler threads can suspend as long as there is at least one
noisy actor registered with the ASIO subsystem. If there are no
noisy actors registered with the ASIO subsystem then scheduler 0
is not allowed to suspend itself.
@gin66
Copy link
Author

gin66 commented Jan 13, 2018

In the meantime I have forked ponyc. In this fork I have added an atomic variable prey_count, which contains the total amount of actors available to thieves. Correct update seems to require only to add atomic_add/sub to every mpmcs push/pop for all queues. Together with a push from the inject-queue I currently just wake up a thread without more consideration. In steal() the prey_count is checked and depending of this: continue stealing and/or wake up more threads or going to sleep.

As I do not understand every aspect of the scheduler,now ask for your comments. This change totally avoids the busy sleep, which is my primary intention.

At the definition of prey_count have added this comment, which should help to understand better:

/* 
 * prey_count contains the number of available messages in any queue.
 * 
 * As this is concurrently modified by all threads, the value is transient.
 * 
 * The idea behind is:
 *    If a thread is finished with one message and there is no prey,
 *    then the thread should go to a blocking sleep.
 *    If there is exactly one prey, then the thread should get it.
 *    If there is more than one prey, another thread can help with
 *    processing and so thread should be woken up.
 *
 * prey_count modifications should happen together with mpmc accesses.
 * Thus a nested pop should not lead to double modifications.
 *  
 * Outlook: Hopefully thee BLOCK/UNBLOCK/CNF/ACK messages can be removed
 *          completely
 */

In the meantime have checked with my socks-proxy and I am very pleased with the improvement in CPU load. Just something is still not working well, because after some time I have received this error:

terminated by signal SIGSEGV (Address boundary error)

Running with lldb reveals:

* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fff5d86967c libsystem_pthread.dylib`pthread_cond_signal + 24
libsystem_pthread.dylib`pthread_cond_signal:
->  0x7fff5d86967c <+24>: cmpq   $0x434f4e44, (%rbx)       ; imm = 0x434F4E44
    0x7fff5d869683 <+31>: jne    0x7fff5d86968c            ; <+40>
    0x7fff5d869685 <+33>: xorl   %r14d, %r14d
    0x7fff5d869688 <+36>: movb   $0x1, %al
Target 0: (pony) stopped.

The compiler is built with:

make default_pic=true dtrace=true scheduler_scaling_pthreads=false

Now modifying the Makefile, because pthread still in use....found it. USE_SCHEDULER_SCALING_PTHREADS is hardcoded for macOS.

The change have pushed to github and the socks-proxy is still running. Happily the CPU-load appears to be even a bit better.

@SeanTAllen
Copy link
Member

SeanTAllen commented Jan 13, 2018

This change will probably have a really awful performance impact for highly concurrent, heavily loaded servers.

@gin66
Copy link
Author

gin66 commented Jan 13, 2018

From my point of view, a better worked out implementation of this change will only slightly impact the performance of a highly concurrent, heavily loaded server. As long as work is available - which is in general the case for a highly concurrent, heavily loaded server - no thread will ever sleep and this change has no further impact besides atomic counting. On the positive side it allows to simplify the scheduler, while keeping the work-stealing principle. This simplification may be sufficient to compensate for the atomic counting overhead. But this is just my opinion, it is not backed up with quantitative data. My use case is an application on a battery driven computer. Having a CPU load of 10% versus now below 1% makes a real difference.

@winksaville
Copy link
Contributor

I whole heartily in favor of a scheduler that does not need a busy sleep. As pointed out, a busy sleep is unacceptable for some systems, such as one of my targets applications, using pony for embedded systems.

If a single scheduler isn't able to provide a solution maybe we'll need pluggable schedulers or conditional compilication. In any case, I hope we can coalesce on a solution.

@SeanTAllen
Copy link
Member

@winksaville @gin66 please see the work that @dipinhora has been doing that has built about the generalized runtime backpressure work that I did done: #2483

@gin66 I appreciate your enthusiasm. Your atomic count will have a huge impact on performance. Especially as it is in one of the hottest paths in the runtime. The more scheduler threads, the more contention on that atomic variable. on every message send. that's a really large impact. The goal is laudable. But putting a contended, atomic variable in an extreme hot path is not a solution that the core team would support.

@gin66
Copy link
Author

gin66 commented Jan 13, 2018

@SeanTAllen Unfortunately I am not familiar with the impact of atomic add/sub in today's caching/pipelining L1/L2/L3 multi core CPUs. And I cannot come up with any measurements to prove myself right or wrong, too. Just for my specific use case I value battery runtime over performance. @winksaville's idea with a user (aka SW developer) selectable scheduler would solve this conflict of interest. If I see it correctly, @dipinhora's work is already in the master branch. This code version has been the basis of my fork and the CPU load is still much higher than my own draft proposal, so I will not use it.

As I have currently no better idea to solve the CPU load issue, I will rewrite my socks-proxy in rust. With this language I have the freedom to set the priority depending on application need. Just that the language itself is much more complicated than pony. It's a real pity.

@SeanTAllen
Copy link
Member

@gin66 I pointed you at an open PR, it is not on master. I think your fork is fine for your use case. It's not a good long term solution but we are working on a long term solution. There's no need to switch it to rust, you'll just need to maintain your fork for a while as we bring down the busy wait.

@dipinhora
Copy link
Contributor

dipinhora commented Jan 13, 2018

@gin66 PR #2483 goes a long way towards your goal of lowering syscalls because it will shut down all scheduler threads if possible (depending on workload). I've tried to architect it with performance in mind so it should have minimal impact on a normal busy workload. It's probably a few days or a week or so before that PR is hopefully merged into master.

It would be great if you could run your application and benchmark the CPU usage and syscalls with the PR changes to confirm that it helps your use case in the meantime. My understanding is that it should significantly lower that 10% idle CPU usage but I'm not 100% sure of that.

Also, there's still room for improvement beyond what that PR accomplishes and it would be awesome if you and @winksaville are able to help and give more specifics as to your use cases and needs so we can all brainstorm to find an appropriate solution. I believe that the current runtime scheduling mechanism is fairly versatile and we should be able to reach the performance targets required without pluggable schedulers or negatively impacting high load applications.

@winksaville
Copy link
Contributor

winksaville commented Jan 13, 2018 via email

@dipinhora
Copy link
Contributor

@winksaville That should be what should happen once #2483 is merged (assuming no more missed edge cases or bugs) since it will suspend all scheduler threads as long as there is at least one actor subscribed with the ASIO subsystem. If there are no actors subscribed with the ASIO subsystem, then lack of work would mean quiescence so scheduler thread 0 isn't allowed to suspend in that case because it needs to be awake to handle work and to detect quiescence.

Also, @gin66, thank you for the inspiration regarding the suspending of all threads as long as at least one actor is subscribed with the ASIO subsystem. That functionality is a direct result of this ticket and your use case.

@winksaville
Copy link
Contributor

I wonder, what if we had a mode that disabled quiescence detection and put the burden on the app when it's time to exit, would that allow all threads to suspend? In an embedded system I can envision a system were interrupts send messages directly to actors and thus detection of quiescence might be difficult. Although I have no idea how quiescence is detected so maybe a nonsensical question.

@dipinhora
Copy link
Contributor

@winksaville Given the relationship between the ASIO subsystem and the scheduler threads (along with the dynamic scheduler changes PR logic), I think that is effectively already in place (assuming I'm understanding the scenario correctly).

The output in my example comment on the PR (#2483 (comment)) shows a situation where an echo server is created to listen on a specific port (i.e. an actor is waiting for an async notification from the OS). In this scenario, all of the scheduler threads suspend (including sched 0). This is because the ASIO subsystem is waiting for an OS notification via either epoll or kqueue or iocp. Once the ASIO subsystem receives the OS notification of an event, it will wake up one of the scheduler threads to handle the notification. If there is no more work to do, the scheduler thread would suspend again waiting to be woken by the ASIO subsystem again. Quiescence would only occur if there are no actors registered with the ASIO subsystem. This is in the actor's control because it has to explicitly tell the ASIO subsystem that it doesn't want to wait for any more notifications. Once there are no more actors registered with the ASIO subsystem, quiescence detection can proceed and relies on the block/cnf/ack messages between the scheduler threads that scheduler thread 0 is responsible for coordinating.

@winksaville
Copy link
Contributor

OK, over time I'll get more familiar, for now things sound more than good enough, txs.

dipinhora added a commit to dipinhora/ponyc that referenced this issue Jan 14, 2018
Change dynamic scheduler scaling implementation in order to resolve
the hangs encountered in ponylang#2451.

The previous implementation assumed that signalling to wake a thread
was a reliable operation. Apparently, that's not necessarily true
(see https://en.wikipedia.org/wiki/Spurious_wakeup and
https://askldjd.com/2010/04/24/the-lost-wakeup-problem/). Seeing
as we couldn't find any other explanation for why the previous
implementation was experiencing hangs, I've assumed it is either
because of lost wake ups or spurious wake ups and redesigned the
logic accordingly.

Now, when a thread is about to suspend, it will decrement the
`active_scheduler_count` and then suspend. When it wakes up, it will
check to see if the `active_scheduler_count` is at least as big as
its `index`. If the `active_scheduler_count` isn't big enough, the
thread will suspend itself again immediately. If it is big enough,
it will resume. Threads no longer modify `active_scheduler_count`
when they wake up.

`active_scheduler_count` must now be modified by the thread that is
waking up another thread prior to sending the wake up notification.
Additionally, since we're now assuming that wake up signals can be
lost, we now send multiple wake up notifications just in case. While
this is somewhat wasteful, it is better than being in a situation
where some threads aren't woken up at all (i.e. a hang).

This commit also includes a change inspired by ponylang#2474. Now, *all*
scheduler threads can suspend as long as there is at least one
noisy actor registered with the ASIO subsystem. If there are no
noisy actors registered with the ASIO subsystem then scheduler 0
is not allowed to suspend itself.
dipinhora added a commit to dipinhora/ponyc that referenced this issue Jan 19, 2018
Change dynamic scheduler scaling implementation in order to resolve
the hangs encountered in ponylang#2451.

The previous implementation assumed that signalling to wake a thread
was a reliable operation. Apparently, that's not necessarily true
(see https://en.wikipedia.org/wiki/Spurious_wakeup and
https://askldjd.com/2010/04/24/the-lost-wakeup-problem/). Seeing
as we couldn't find any other explanation for why the previous
implementation was experiencing hangs, I've assumed it is either
because of lost wake ups or spurious wake ups and redesigned the
logic accordingly.

Now, when a thread is about to suspend, it will decrement the
`active_scheduler_count` and then suspend. When it wakes up, it will
check to see if the `active_scheduler_count` is at least as big as
its `index`. If the `active_scheduler_count` isn't big enough, the
thread will suspend itself again immediately. If it is big enough,
it will resume. Threads no longer modify `active_scheduler_count`
when they wake up.

`active_scheduler_count` must now be modified by the thread that is
waking up another thread prior to sending the wake up notification.
Additionally, since we're now assuming that wake up signals can be
lost, we now send multiple wake up notifications just in case. While
this is somewhat wasteful, it is better than being in a situation
where some threads aren't woken up at all (i.e. a hang).

Additionally, only use `scheduler_count_changing` for `signals`
implementation of dynamic scheduler scaling. `pthreads`
implementation now uses a mutex (`sched_mut`) in its place.
We also now change logic to only unlock mutex in `pthreads`
implementation once threads have been woken to avoid potential
lost wake ups. This isn't an issue for the `signals` implementation
and the unlocking of `scheduler_count_changing` can remain where it
is prior to threads being woken up.

This commit also splits out scheduler block/unblock message handling
logic into their own functions (this is so that sched 0 can call those
functions directly instead of sending messages to itself).

This commit also includes a change inspired by ponylang#2474. Now, *all*
scheduler threads can suspend as long as there is at least one
noisy actor registered with the ASIO subsystem. If there are no
noisy actors registered with the ASIO subsystem then scheduler 0
is not allowed to suspend itself because it is reponsible for
quiescence detection.

Lastly, this commit adds logic to allow a scheduler thread to suspend
even if it has already sent a scheduler block message so that we can
now suspend scheduler threads in most scenarios.
SeanTAllen pushed a commit that referenced this issue Jan 20, 2018
Change dynamic scheduler scaling implementation in order to resolve
the hangs encountered in #2451.

The previous implementation assumed that signalling to wake a thread
was a reliable operation. Apparently, that's not necessarily true
(see https://en.wikipedia.org/wiki/Spurious_wakeup and
https://askldjd.com/2010/04/24/the-lost-wakeup-problem/). Seeing
as we couldn't find any other explanation for why the previous
implementation was experiencing hangs, I've assumed it is either
because of lost wake ups or spurious wake ups and redesigned the
logic accordingly.

Now, when a thread is about to suspend, it will decrement the
`active_scheduler_count` and then suspend. When it wakes up, it will
check to see if the `active_scheduler_count` is at least as big as
its `index`. If the `active_scheduler_count` isn't big enough, the
thread will suspend itself again immediately. If it is big enough,
it will resume. Threads no longer modify `active_scheduler_count`
when they wake up.

`active_scheduler_count` must now be modified by the thread that is
waking up another thread prior to sending the wake up notification.
Additionally, since we're now assuming that wake up signals can be
lost, we now send multiple wake up notifications just in case. While
this is somewhat wasteful, it is better than being in a situation
where some threads aren't woken up at all (i.e. a hang).

Additionally, only use `scheduler_count_changing` for `signals`
implementation of dynamic scheduler scaling. `pthreads`
implementation now uses a mutex (`sched_mut`) in its place.
We also now change logic to only unlock mutex in `pthreads`
implementation once threads have been woken to avoid potential
lost wake ups. This isn't an issue for the `signals` implementation
and the unlocking of `scheduler_count_changing` can remain where it
is prior to threads being woken up.

This commit also splits out scheduler block/unblock message handling
logic into their own functions (this is so that sched 0 can call those
functions directly instead of sending messages to itself).

This commit also includes a change inspired by #2474. Now, *all*
scheduler threads can suspend as long as there is at least one
noisy actor registered with the ASIO subsystem. If there are no
noisy actors registered with the ASIO subsystem then scheduler 0
is not allowed to suspend itself because it is reponsible for
quiescence detection.

Lastly, this commit adds logic to allow a scheduler thread to suspend
even if it has already sent a scheduler block message so that we can
now suspend scheduler threads in most scenarios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants