workers: implement --max-worker-threads command line option #32606

jasnell · 2020-04-02T00:13:27Z

There are two commits here:

Adds a simple CPUInfo utility to make using uv_cpu_info just a tad easier
Implements the --max-worker-threads command line flag..

From the second commit message:

    Creating too many active worker threads at one time can
    lead to significant performance degradation of the entire
    Node.js process. This adds a worker thread counter that
    will cause a warning to be emitted if exceeded. Workers
    can still be created beyond the limit, however. The warning
    is similar in spirit to the too many event handlers warning
    emitted by EventEmitter.

    By default, the limit is one less than four times the total
    number of CPUs available calculated at system start. The
    `--max-worker-threads` command-line option can be set to
    set a non-default value. The option is permitted in
    `NODE_OPTIONS` and must be positive number greater than
    zero.

    The counter and the option are per-process in order to
    account for Workers that create their own Workers.

    The warning will be emitted once each time the limit
    is exceeded, so may be emitted more than once per process.
    That is, if the limit is 2, and 5 workers are created, only
    a single warning will be emitted. If the number of active
    workers falls back below 2 and is subsequently exceeded
    again, the warning will be emitted again.

/cc @addaleax

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

src/util.h

src/node_options.cc

src/node_worker.cc

test/parallel/test-worker-max-count.js

doc/api/cli.md

src/node_options.cc

addaleax · 2020-04-02T00:43:06Z

I’m adding the semver-major label because this adds a warning where previously none is emitted.

Like I said above, I think it’s worth considering not turning this on by default either… I’ll think about that a bit more.

/cc @nodejs/workers

mscdex · 2020-04-02T00:55:56Z

I'm not a fan of this being activated by default. I think the warning should only be emitted when someone explicitly enables the feature (via command line option or whatever future method).

gireeshpunathil · 2020-04-02T01:11:47Z

it may not be fair to assert that too many threads are created on program error?
it may not be fair to assume too many threads are indicative of unbounded growth?
it may not be fair to assume too many threads are indicative of starvation either, it might depend on the workload?
because of the warning criteria depends on the number of cores, the same code may behave differently in different environments, and can cause concern for users?

I propose:

print it only once: gives an indication that it is worthwhile to check the thread creation logic
increase the threshold to a much larger value,
and make it a constant, or a function of a constant as well

jasnell · 2020-04-02T04:13:50Z

Ok, updated such that:

The warning is off by default. Setting explicitly to 0 disables it.
Setting to any value < 0 causes it to be auto-calculated to 4 times the number of CPUs
Setting to any value > 0 sets the limit explicitly to that value.

src/node_worker.cc

jasnell · 2020-04-02T15:14:36Z

@addaleax ... ok, took another round of edits to clean up the atomics usage. Quite a bit more reliable now. Also added a test for the nested workers emitting warning on the main thread. Please take another look when you have a moment :-)

test/parallel/test-worker-max-count-disabled.js

nodejs-github-bot · 2020-04-02T16:13:16Z

CI: https://ci.nodejs.org/job/node-test-pull-request/30372/

nodejs-github-bot · 2020-04-02T16:35:58Z

CI: https://ci.nodejs.org/job/node-test-pull-request/30377/

nodejs-github-bot · 2020-04-02T17:29:01Z

CI: https://ci.nodejs.org/job/node-test-pull-request/30379/

Utility helper that makes working with uv_cpu_info easier Signed-off-by: James M Snell <jasnell@gmail.com>

Creating too many active worker threads at one time can lead to significant performance degradation of the entire Node.js process. This adds a worker thread counter that will cause a warning to be emitted if exceeded. Workers can still be created beyond the limit, however. The warning is similar in spirit to the too many event handlers warning emitted by EventEmitter. By default, the limit is one less than four times the total number of CPUs available calculated at system start. The `--max-worker-threads` command-line option can be set to set a non-default value. The option is permitted in `NODE_OPTIONS` and must be positive number greater than zero. The counter and the option are per-process in order to account for Workers that create their own Workers. The warning will be emitted once each time the limit is exceeded, so may be emitted more than once per process. That is, if the limit is 2, and 5 workers are created, only a single warning will be emitted. If the number of active workers falls back below 2 and is subsequently exceeded again, the warning will be emitted again. Signed-off-by: James M Snell <jasnell@gmail.com>

nodejs-github-bot · 2020-04-02T17:44:12Z

CI: https://ci.nodejs.org/job/node-test-pull-request/30381/

jasnell · 2020-04-02T19:41:35Z

Failures in CI appear to be a bug introduced by #32531 that @addaleax is investigating. Once that is fixed I will run CI again.

addaleax · 2020-04-02T22:04:28Z

@jasnell #32623 should unblock this

bnoordhuis · 2020-04-02T22:07:59Z

I can see why having a circuit breaker for runaway threads is useful but:

Heuristics based on the number of CPUs just seems wrong. Might as well set it to 1 + rand() % 15, that's probably no worse on average.
How does this interact with child_process.fork()?

jasnell · 2020-04-02T22:20:10Z

Heuristics based on the number of CPUs just seems wrong. Might as well set it to 1 + rand() % 15, that's probably no worse on average.

The auto-calculation there is only one of the options in this. I'm kicking off a performance study that is going to be looking at multiple kinds of workloads so we can hopefully narrow in on a better heuristic. One thing we could definitely do to give us wiggle room on the heuristic is to mark this experimental for the time being.

How does this interact with child_process.fork()?

It doesn't for the time being. Definitely open to ideas there.

bnoordhuis · 2020-04-03T10:14:14Z

I spent a lot of time thinking about auto-tuning in the context of libuv's thread pool and I came to the conclusion that it's hopeless.

A program doesn't have enough insight into the system to make educated guesses. It's hard even for the kernel and that has a perfect view of system utilization (but still misses the ability to predict the future.)

jasnell · 2020-04-03T16:27:05Z

I spent a lot of time thinking about auto-tuning in the context of libuv's thread pool and I came to the conclusion that it's hopeless.

Yep, which is why this prioritizes allowing the user to set a threshold and only uses a warning that still allows the Workers to be created. It's a diagnostic option.. which, btw, we could handle in other ways if the warning is not sufficient.

Currently, there is no way of actively tracking the total number of Workers created across the process (async_hooks only provide detail on the Workers created in the current thread). Another approach we could take is to add some diagnostic tracking apis to either the worker_threads or perf_hooks modules that would allow the main thread to report on Workers across the process.

jasnell · 2020-04-03T17:15:11Z

Marking this in-progress while discussion is ongoing to keep it from landing until resolved

lundibundi

Definitely +1 on the idea but I agree that the default should be 0 as any guesses here are nothing more than guesses IMO.

Also, perhaps we can name this something "weaker" than "max-worker-threads" as for me the name implies that we won't be able to create more than the specified amount of workers and not that we will just get a warning about it?

lundibundi · 2020-04-06T10:31:34Z

src/node_worker.cc

+    // too much CPU contention. The default max-worker-threads is
+    // 4 times the total number of CPUs available but may be set


Suggested change

// too much CPU contention. The default max-worker-threads is

// 4 times the total number of CPUs available but may be set

// too much CPU contention. By default max-worker-threads

// check is disabled but may be set

lundibundi · 2020-04-06T10:34:07Z

src/util.h

+      uv_free_cpu_info(info_, count_);
+  }
+  int count() const { return count_; }
+  operator bool() const {


Nit: I think we usually try to put empty lines in between methods. Could you also run make format-cpp for consistency?

Sigh...

C:\Users\jasne\Projects\node>vcbuild format-cpp Error: invalid command line option `format-cpp`.

Yeah, when I update this I'll switch over to the linux box and tweak the formatting.

lundibundi · 2020-04-06T10:35:52Z

test/parallel/test-worker-max-count-auto.js

+
+// Check that when --max-worker-threads is negative,
+// the option value is auto-calculated based on the
+// number of CPUs


Nit: (here and below)

Suggested change

// number of CPUs

// number of CPUs.

lundibundi · 2020-04-06T10:43:37Z

test/parallel/test-worker-max-count-disabled.js

+    list.push(makeWorker(workers));
+  await Promise.all(list);
+  workers.forEach((i) => i.terminate());
+}


Nit: simplification and a bit fewer promises (and same applies to the other test if you decide to change)

function makeWorker() { return new Promise((res) => { const worker = new Worker(expr, { eval: true }); worker.once('online', () => res(worker)); }); } async function doTest() { const promises = []; for (let n = 0; n < cpu_count; n++) promises.push(makeWorker()); return Promise.all(promises) .then((workers) => workers.forEach((i) => i.terminate()) }

If we do that, then I’d go one step further and use

async function makeWorker() { const worker = new Worker(expr, { eval: true }); await once(worker, 'online'); return worker; }

🙂

jasnell · 2020-04-07T13:46:53Z

All, I'm going to take this a slightly different direction. As I mentioned above, the intent here is largely to improve process-wide visibility of the number of workers that are being created since, currently, they are only visible to the immediate parent thread that created them. Instead of emitting a warning, here's what I'm thinking:

Via the perf_hooks API, introduce a new WorkerHook that will receive a notification for any descendant Workers created and will provide access to a process-wide counter of the total number of active Workers.

The tracker takes it's design inspiration from async hooks. However, unlike async hooks, it is not limited to reporting on only the handles associated with the threads event loop.

const { WorkerHook } = require('perf_hooks');
const hook = new WorkerHook({
  init(id, parent_id, handled) {
    console.log(`Worker ${id} created by thread ${parent_id}.`);
    // handled is described below...
  },
  destroy(id, parent_id, handled) {
    console.log(`Worker ${id} destroyed`);
  }
});
hook.enable();

// Atomically get the number of workers process-wide without creating a hook
console.log(WorkerHook.processWorkerCount);

Let's say that Main Thread creates Worker-1, which in turn creates Worker-2. Assuming both Main Thread and Worker-1 each create their own WorkerHook instances, when Worker-2 is created, the Worker-1 hook will be notified first, then the main thread hook.

The handled argument in the hooks indicates if the event was dispatched successfully to a WorkerHook in a descendant thread. So, in the above case... if we assume:

Worker-1:

const hook = WorkerHook({
  init(id, parent_id, handled) { /** handled will be false **/ }
});
hook.enable()

Main Thread:

const hook = WorkerHook({
  init(id, parent_id, handled) { /** handled will be true **/ }
});
hook.enable()

However... if disable the Worker-1 hook or omit the init() handler, the handled argument in the Main Thread will be false.

If multiple WorkerHook instances are created within a single thread, they are invoked in the order they are created, and the handled argument will reflect the status accordingly.

The API here serves two key goals:

Be capable of tracking the total number of active workers process-wide in a cross-platform way.
Be capable of having a rough idea of what part of your code is creating those workers.

Several discussion points:

It's not super important whether this lives in the perf_hooks, async_hooks, or worker_threads module. I picked perf_hooks because I'm also considering adding a histogram that tracks lifetime duration of workers process-wide, but that's not in-scope currently and might be handled a different way (via trace events)
I'm considering an option that would include the stack trace at the point of Worker creation/destruction in the handler. The option would be off by default to limit the performance hit. Essentially: const hook = new WorkerHook({ trace: true, init(id, parent_id, handled, stacktrace) { /** ... **/ }}). The other option here, however, would be to introduce a --trace-worker command-line option that, similar to -trace-sync-io, would emit a stack trace to the console upon worker creation and destruction. Definitely would like opinions on this part. There are OS-specific utilities that can provide this type of insight but none of them have visibility into the JavaScript layer where the Workers are actually created.

jasnell · 2020-05-06T15:00:20Z

Closing this until I can get back to it.

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Apr 2, 2020

jasnell requested review from addaleax and lundibundi April 2, 2020 00:13

jasnell force-pushed the max-worker-count branch from b851f4c to 1d2c945 Compare April 2, 2020 00:16

jasnell changed the title ~~Max worker count~~ workers: implement --max-worker-threads command line option Apr 2, 2020

jasnell added semver-minor PRs that contain new features and should be released in the next minor version. worker Issues and PRs related to Worker support. labels Apr 2, 2020

jasnell force-pushed the max-worker-count branch from 1d2c945 to d1e4236 Compare April 2, 2020 00:21

addaleax reviewed Apr 2, 2020

View reviewed changes

doc/api/cli.md Outdated Show resolved Hide resolved

addaleax added semver-major PRs that contain breaking changes and should be released in the next major version. and removed semver-minor PRs that contain new features and should be released in the next minor version. labels Apr 2, 2020

addaleax reviewed Apr 2, 2020

View reviewed changes

src/node_options.cc Outdated Show resolved Hide resolved

jasnell force-pushed the max-worker-count branch from 495f2a0 to cc4bfab Compare April 2, 2020 01:05

jasnell requested a review from addaleax April 2, 2020 04:13

addaleax added semver-minor PRs that contain new features and should be released in the next minor version. and removed semver-major PRs that contain breaking changes and should be released in the next major version. labels Apr 2, 2020

addaleax approved these changes Apr 2, 2020

View reviewed changes

src/node_worker.cc Outdated Show resolved Hide resolved

src/node_worker.cc Outdated Show resolved Hide resolved

src/node_worker.cc Outdated Show resolved Hide resolved

addaleax approved these changes Apr 2, 2020

View reviewed changes

test/parallel/test-worker-max-count-disabled.js Show resolved Hide resolved

test/parallel/test-worker-max-count-disabled.js Outdated Show resolved Hide resolved

jasnell force-pushed the max-worker-count branch from 1ce6f54 to 1327c0f Compare April 2, 2020 16:35

src: add CPUInfo utility class

05f0dfb

Utility helper that makes working with uv_cpu_info easier Signed-off-by: James M Snell <jasnell@gmail.com>

jasnell added 2 commits April 2, 2020 10:34

fixup! src: implement --max-worker-threads warning limit

2c99cd5

jasnell force-pushed the max-worker-count branch from 29bc7cb to 2c99cd5 Compare April 2, 2020 17:42

benjamingr approved these changes Apr 3, 2020

View reviewed changes

jasnell added the wip Issues and PRs that are still a work in progress. label Apr 3, 2020

lundibundi approved these changes Apr 6, 2020

View reviewed changes

fhinkel approved these changes Apr 21, 2020

View reviewed changes

jasnell closed this May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workers: implement --max-worker-threads command line option #32606

workers: implement --max-worker-threads command line option #32606

jasnell commented Apr 2, 2020

addaleax commented Apr 2, 2020

mscdex commented Apr 2, 2020

gireeshpunathil commented Apr 2, 2020

jasnell commented Apr 2, 2020

jasnell commented Apr 2, 2020

nodejs-github-bot commented Apr 2, 2020

nodejs-github-bot commented Apr 2, 2020

nodejs-github-bot commented Apr 2, 2020

nodejs-github-bot commented Apr 2, 2020

jasnell commented Apr 2, 2020

addaleax commented Apr 2, 2020

bnoordhuis commented Apr 2, 2020

jasnell commented Apr 2, 2020

bnoordhuis commented Apr 3, 2020

jasnell commented Apr 3, 2020

jasnell commented Apr 3, 2020

lundibundi left a comment

lundibundi Apr 6, 2020

lundibundi Apr 6, 2020

jasnell Apr 7, 2020

lundibundi Apr 6, 2020

lundibundi Apr 6, 2020

addaleax Apr 6, 2020

jasnell commented Apr 7, 2020 •

edited

Loading

jasnell commented May 6, 2020

		// too much CPU contention. The default max-worker-threads is
		// 4 times the total number of CPUs available but may be set

workers: implement --max-worker-threads command line option #32606

workers: implement --max-worker-threads command line option #32606

Conversation

jasnell commented Apr 2, 2020

Checklist

addaleax commented Apr 2, 2020

mscdex commented Apr 2, 2020

gireeshpunathil commented Apr 2, 2020

jasnell commented Apr 2, 2020

jasnell commented Apr 2, 2020

nodejs-github-bot commented Apr 2, 2020

nodejs-github-bot commented Apr 2, 2020

nodejs-github-bot commented Apr 2, 2020

nodejs-github-bot commented Apr 2, 2020

jasnell commented Apr 2, 2020

addaleax commented Apr 2, 2020

bnoordhuis commented Apr 2, 2020

jasnell commented Apr 2, 2020

bnoordhuis commented Apr 3, 2020

jasnell commented Apr 3, 2020

jasnell commented Apr 3, 2020

lundibundi left a comment

Choose a reason for hiding this comment

lundibundi Apr 6, 2020

Choose a reason for hiding this comment

lundibundi Apr 6, 2020

Choose a reason for hiding this comment

jasnell Apr 7, 2020

Choose a reason for hiding this comment

lundibundi Apr 6, 2020

Choose a reason for hiding this comment

lundibundi Apr 6, 2020

Choose a reason for hiding this comment

addaleax Apr 6, 2020

Choose a reason for hiding this comment

jasnell commented Apr 7, 2020 • edited Loading

jasnell commented May 6, 2020

jasnell commented Apr 7, 2020 •

edited

Loading