WIP: optimize process spawning on Linux #118750
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR optimizes process spawning on Linux by avoiding allocations and sorting when copying environment variables. The code is quite WIP, as I'm not sure where should I put this optimized code (
process_unix
,process_common
?).I wasn't able to get large speedups by just removing the
BTreeMap
, so I went all the way and tried to remove as many allocations as possible. I created a simple benchmark here that spawns 8k processes, each with ~200 environment variables. Locally, on my Linux5.15.0
and glibc2.35
, the benchmark goes from~1.4s
to~1s
after the changes. This is of course a massive stress test for command spawning, for most situations, the perf. improvements will be negligible.One behaviour change of this PR is that the environment variables are no longer sorted before being passed to the spawned command, but I don't think that it's very important on Linux.
The motivation for this PR is that process spawning on Linux can be quite slow if there are a lot of environment variables, you don't use
Command::env_clear()
and you set some environment variables. In that case, all existing environment variables will be copied (several times), which can be a bottleneck if there are a lot of variables. This has occured to me in HyperQueue, which spawns a lot of processes, and this turned out to be a bottleneck in some cases. Some discussion of this problem has happened on Zulip.