Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the calculation for the number of workers in the worker pool #5041

Merged
merged 1 commit into from
Jun 18, 2024

Conversation

andiflabs
Copy link
Contributor

@andiflabs andiflabs commented Jun 13, 2024

Summary

This commit changes the default value for nodeWorkersMax to the number of CPU cores, minus one, capped at 16. Before this change, nodeWorkersMax was hardcoded to 6.

As a consequence of this change, most users with empty configuration will observe a larger number of workers spawned. Users who run on systems with SMT enabled and wi.

For example: on a system with 2 cores, with SMT enabled (4 logical CPUs in total), and empty configuration: before this change, the number of workers would have been .

The rationale behind using the number of CPU cores rather than the number of logical CPUs is that the worker pool should mostly be used for CPU-bound and CPU-intensiv.

This commit also changes the default value for nodeWorkers to be based on the number of processing units available to the process, rather than the number of logical.

Some examples to understand the impact of this change:

  • users with nodeWorkers set in their configuration: no change
  • users with nodeWorkers/nodeWorkersMax not set, 2 CPU cores, SMT enabled (4 CPU threads):
    number of workers before this change = min(4 - 1, 6) = 3
    number of workers after this change = min(4 - 1, 2 - 1, 16) = 1
  • users with nodeWorkers/nodeWorkersMax not set, 8 CPU cores, SMT enabled (16 CPU threads):
    number of workers before this change = min(16 - 1, 6) = 6
    number of workers after this change = min(16 - 1, 8 - 1, 16) = 7
  • users with nodeWorkers/nodeWorkersMax not set, 8 CPU cores, SMT disabled (8 CPU threads):
    number of workers before this change = min(16 - 1, 6) = 6
    number of workers after this change = min(8 - 1, 8 - 1, 16) = 7
  • users with nodeWorkers/nodeWorkersMax not set, 32 CPU cores, SMT enabled (64 CPU threads):
    number of workers before this change = min(64 - 1, 6) = 6
    number of workers after this change = min(64 - 1, 32 - 1, 16) = 16
  • users with nodeWorkers/nodeWorkersMax not set, 32 CPU cores, SMT enabled (64 CPU threads), scheduler affinity set to 4 CPUs:
    number of workers before this change = min(64 - 1, 6) = 6
    number of workers after this change = min(4 - 1, 32 - 1, 16) = 3

Testing Plan

Documentation

Does this change require any updates to the Iron Fish Docs (ex. the RPC API
Reference
)? If yes, link a
related documentation pull request for the website.

[ ] Yes

Breaking Change

Is this a breaking change? If yes, add notes below on why this is breaking and label it with breaking-change-rpc or breaking-change-sdk.

[ ] Yes

}
// TODO: ideally we should use `os.availableParallelism()` here, instead of
// `os.cpus().length`, but it seems like Typescript doesn't have the type
// bindings for it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this too, but I think it was only added in 18.14.0: https://nodejs.org/api/os.html#osavailableparallelism and we technically support all versions of 18. You could check if it exists at runtime and call it, else fall back to os.cpus().length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I was convinced we jumped to Node 20, but no, you're right: we're still on Node 18

@andiflabs andiflabs force-pushed the andrea/better-num-workers branch from 9aef7cf to a7f81df Compare June 14, 2024 21:32
@andiflabs andiflabs marked this pull request as ready for review June 15, 2024 21:13
@andiflabs andiflabs requested a review from a team as a code owner June 15, 2024 21:13
@andiflabs andiflabs force-pushed the andrea/better-num-workers branch from a7f81df to 17106cb Compare June 17, 2024 17:37
@@ -90,13 +90,24 @@ export type ConfigOptions = {
/**
* The number of CPU workers to use for long-running node operations, like creating
* transactions and verifying blocks. 0 disables workers (this is likely to cause
* performance issues), and -1 auto-detects based on the number of CPU cores.
* performance issues), and a negative value resolves to the amount of parallelism units
* available to the process (as returned by `os.availableParallelism()`) minus one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While correct, node runners are often dev-ops / normal technical people, while this documentation is engineering focused. I think this is fine but I try not to refer to node documentation or something dev-ops people may struggle with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize this was user-visible documentation; I'll correct that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually copy these docs into the website, but it's manual and we don't need too. I think this is fine if we don't copy them.

@andiflabs andiflabs force-pushed the andrea/better-num-workers branch from 17106cb to 0d4a81e Compare June 18, 2024 17:48
@NullSoldier
Copy link
Contributor

NullSoldier commented Jun 18, 2024

This PR looks good, but I want to add some context for discussion because I know node-runners that won't be able to start their node now may be coming here for questions. Technically this PR makes perfect sense, but business wise there are some considerations.

First, user's have a wide variety of configurations. We have users with high mem/low cpu, high cpu/low mem, low mem/low cpu. During our testnet phases we required users to run nodes to accumulate testnest activity points. This means that during our testnet phase, we had a large range of low end VPS's being rented to run Iron Fish nodes that probably had very little memory, and often many more cores. On top of that, there were a few 100 core user's running nodes that would brag and show off their HTOP.

Now onto the next consideration. Node is a memory hungry beast. Unfortunately because JS is single threaded, if you want parallelism you need to use a node forked process worker pool. That means that our worker pool spawns essentially new node runtimes for each worker. That takes 100+ MB for each new worker spawned. If you have 10 physical cores you'll need enough memory for the main node process and 9 cores.

Now back to the original of maxNodeWorkers. During our testnet we had many users that could simply not start their nodes when we increased the amount of parallelism because their systems would scale up and unexpectedly run out of memory. The reason we introduced was to because we made a trade off. We decided arbitrarily that 6 workers was enough.

The reason we wanted to cap it down was to make most user's node's run without crashing, and because we wanted developers to develop with the understanding of what it felt like to run a low end / medium node and have sympathy for the user. We found that when we capped the performance of the node to an expected average user's machine, engineers contributed optimizations much more frequently. We also no longer received reports of crashing from OOO on startup from the worker pool spawning workers.

So this might not be a problem anymore, but I think the PR still should address the original reason maxNodeWorkers existed and that you should make the case that it isn't needed anymore or that our end user's won't be impacted if we make this change. I think some may no longer be able to start their nodes without tweaking the configuration, and we can decide that's ok.

@NullSoldier NullSoldier reopened this Jun 18, 2024
@andiflabs andiflabs force-pushed the andrea/better-num-workers branch from 0d4a81e to 815c2ed Compare June 18, 2024 21:21
This commit changes the default value for `nodeWorkersMax` to the number
of CPU cores, minus one, capped at 16. Before this change,
`nodeWorkersMax` was hardcoded to 6, which resulted in limited
performance on systems that have more than 6 cores.

As a consequence of this change, most users with empty configuration
will observe a larger number of workers spawned. Users who run on
systems with SMT enabled and with a low number of cores may observe a
lower number of workers.

For example: on a system with 2 cores, with SMT enabled (4 logical CPUs
in total), and empty configuration: before this change, the number of
workers would have been 4 - 1 = 3, after this change it is 2 - 1 = 1.

The rationale behind using the number of CPU cores rather than the
number of logical CPUs is that the worker pool should mostly be used for
CPU-bound and CPU-intensive workloads. For this kind of workloads, SMT
is known to be detrimetrial for performance (other workloads that are
primarily I/O-bound should not use the worker pool but would benefit
from async code).

This commit also changes the default value for `nodeWorkers` to be based
on the number of processing units available to the process, rather than
the number of logical CPUs. On most systems, this will not result in any
difference. This change will only impact users that limit the Node
process through sandboxing tecniques like CPU affinity masks or cgroups.

Some examples to understand the impact of this change:

- users with `nodeWorkers` set in their configuration: no change
- users with `nodeWorkers`/`nodeWorkersMax` not set, 2 CPU cores, SMT
  enabled (4 CPU threads):
  number of workers before this change = min(4 - 1, 6) = 3
  number of workers after this change = min(4 - 1, 2 - 1, 16) = 1
- users with `nodeWorkers`/`nodeWorkersMax` not set, 8 CPU cores, SMT
  enabled (16 CPU threads):
  number of workers before this change = min(16 - 1, 6) = 6
  number of workers after this change = min(16 - 1, 8 - 1, 16) = 7
- users with `nodeWorkers`/`nodeWorkersMax` not set, 8 CPU cores, SMT
  disabled (8 CPU threads):
  number of workers before this change = min(16 - 1, 6) = 6
  number of workers after this change = min(8 - 1, 8 - 1, 16) = 7
- users with `nodeWorkers`/`nodeWorkersMax` not set, 32 CPU cores, SMT
  enabled (64 CPU threads):
  number of workers before this change = min(64 - 1, 6) = 6
  number of workers after this change = min(64 - 1, 32 - 1, 16) = 16
- users with `nodeWorkers`/`nodeWorkersMax` not set, 32 CPU cores, SMT
  enabled (64 CPU threads), scheduler affinity set to 4 CPUs:
  number of workers before this change = min(64 - 1, 6) = 6
  number of workers after this change = min(4 - 1, 32 - 1, 16) = 3
@andiflabs andiflabs force-pushed the andrea/better-num-workers branch from 815c2ed to c237236 Compare June 18, 2024 21:24
@andiflabs andiflabs merged commit c237236 into staging Jun 18, 2024
13 checks passed
@andiflabs andiflabs deleted the andrea/better-num-workers branch June 18, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants