Skip to content

Commit

Permalink
Fix excess work stealing under low loads (#2254)
Browse files Browse the repository at this point in the history
Fixes #1787

Interestingly, all the info needed to solve this issue a while ago was
already in the issue but it wasn't until @slfritchie put his additional
comments in
#1787 (comment)
that it all clicked for me.

The excess CPU time is from us doing too much work stealing. In a normal
scenario, with nothing to do, we'd not doing anything for a long time
and we'd end up sleeping for quite a while.

With the timer that goes off every few seconds as seen in the issue,
that isn't what happens. We regularly get woken and end up in a work
stealing cycle.

Then, due to the lack of an `else` block for yielding, on OSX, we'd
nanosleep for 0 which is the same as an immediate return. To see what
the impact of that would be on any platform change the:

```c
  // 10m cycles is about 3ms
    if((tsc2 - tsc) < 10000000)
        return;
```

to

```c
  // 10m cycles is about 3ms
    if((tsc2 - tsc) < 1000000000)
        return;
```

This is effectively what we were running. That's a lot more
work-stealing. And, not the increased CPU usage. The reason this was
happening more on OSX is that on Linux, nanosleep 0 will sleep for at
least a bit. Here we remove the variability and do a small nanosleep
that will be the same across all platforms.
  • Loading branch information
SeanTAllen authored and Theodus committed Sep 29, 2017
1 parent af61c63 commit ec0ce07
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions src/libponyrt/sched/cpu.c
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,11 @@ void ponyint_cpu_core_pause(uint64_t tsc, uint64_t tsc2, bool yield)
// If it has been 1 billion cycles, pause 1 ms.
ts.tv_nsec = 1000000;
}
else
{
// Otherwise, pause for 100 microseconds
ts.tv_nsec = 100000;
}
}

DTRACE1(CPU_NANOSLEEP, ts.tv_nsec);
Expand Down

0 comments on commit ec0ce07

Please sign in to comment.