-
-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch invokes of linters (and other dynamically sized process invokes) #13462
Comments
Sorry for the trouble! We really ought to be generating batches in this case for performance reasons anyway: we only support either "run for all" or "run per file" (https://www.pantsbuild.org/docs/reference-lint#advanced-options), and the performance sweet spot (not to mention the size that would avoid issues like this) is likely to be batches of a size somewhere in between. |
Assuming that we do batching, it seems like there is also an opportunity to align with #9964, and to dynamically batch based on how much parallelism is available. But dynamically choosing batch sizes based on parallelism would require manipulating cache keys (because otherwise hosts with different numbers of cores would never get remote cache hits)... and that is a risky business, which also requires figuring out how to safely split the outputs of tools (or to require that batched processes opt-out of capturing outputs when they succeed, for example). |
So it turns out there is a "hack" that can be used in the meantime. Setting the relevant parallelism options for each tool turns out to be a "poor-mans-batch" is faster than not setting it (obviously) but also faster than the E.g. Depending on how long fixing the issue might take, it might be worth finding/writing clever little plugins to set the num jobs smartly based on CPU count |
@thejcannon : Thanks for the report: that's very useful information. I don't think that I realized that so many Python linters supported parallelization arguments, but it makes sense. Will post a summary soon. |
FWIW as well, I'd expect those linters to have better parallelization in-tool than out-of-tool because they know what can/can't be shared across processes and have the ability to play clever tricks. That being said out-of-tool parallelization still beats no parallelization 😉 |
There are a few angles to this issue and to #9964: In the long term, we want to lower the per-process overhead (from 1. sandbox creation, 2. process startup, 3. inability to reuse shared inputs) to the point where running per-file is clearly the right choice to maximize cache hit rates (by reducing/reusing sandboxes, and even reusing processes in sandboxes with nailgun). But on the (long) road to zero per-process overhead, we should implement batching both for the purposes of:
Because batching inherently affects the output of processes, it will require use of an explicit API. While it would be very interesting to be able to automatically batch all single-file processes, we cannot generically split outputs. It would also mean doing lots of redundant work in AFAICT, there is no need for the rust code's cooperation in this, so I'm planning to implement this as a new @dataclass
class BatchProcess:
# A process representing all inputs, but with a partial `args` list which will be filled in with (portions
# of) the `xargs` list.
process_template: Process
# A list of arguments which will be partitioned as needed and appended to the `args`.
xargs: tuple[str, ...] Future implementations could support templating into other positions in the This will get us a small amount of additional parallelism in some cases, but as @thejcannon pointed out: the vast majority of the processes we invoke support their own parallellism, and so To take advantage of that internal parallelism, we will likely also implement #9964: expect more there soon. |
I like the idea! Some thoughts (which also apply to #9964 so I'll repeat there):
Additionally for this issue (and not #9964) it might also be prudent to warn if this is enabled for a tool that is supported via #9964, since that will almost certainly result in faster runs |
(Also duping with #9964) As a single datapoint, on my 64-core machine with formatters
So I expect this issue to land less performant than bullet 3 in each case (which would be #9964), but not by much. |
With enough similar metrics you might be able to deprecate ``per-file-caching` 🤔 |
To prepare to add batching for formatters as part of #13462, this change removes the need to implement per-backend `@rules` that pipeline `FmtRequest`s that apply to a particular language. Instead, targets are grouped by which `FmtRequest`s apply to them, and then those requests are run sequentially. There will be further changes to the formatting API in support of #13462, so this API is not final. [ci skip-rust] [ci skip-build-wheels]
As described in #13462, there are correctness concerns around not breaking large batches of files into smaller batches in `lint` and `fmt`. But there are other reasons to batch, including improving the performance of linters which don't support internal parallelism (by breaking them into multiple processes which _can_ be parallelized). This change adds a function to sequentially partition a list of items into stable batches, and then uses it to create batches by default in `lint` and `fmt`. Sequential partitioning was chosen rather than bucketing by hash, because it was easier to reason about in the presence of minimum and maximum bucket sizes. Additionally, this implementation is at the level of the `lint` and `fmt` goals themselves (rather than within individual `lint`/`fmt` `@rule` sets, as originally suggested [on the ticket](#13462 (comment))) because that reduces the effort of implementing a linter or formatter, and would likely ease doing further "automatic"/declarative partitioning in those goals (by `Field` values, for example). `./pants --no-pantsd --no-local-cache --no-remote-cache-read fmt lint ::` runs about ~4% faster than on main. Fixes #13462. [ci skip-build-wheels]
…ility. (#14210) As a followup to #14186, this change improves the stability (and thus cache hit rates) of batching by removing the minimum bucket size. It also fixes an issue in the tests, and expands the range that they test. As mentioned in the expanded comments: capping bucket sizes (in either the `min` or the `max` direction) can cause streaks of bucket changes: when a bucket hits a `min`/`max` threshold and ignores a boundary, it increases the chance that the next bucket will trip a threshold as well. Although it would be most-stable to remove the `max` threshold entirely, it is necessary to resolve the correctness issue of #13462. But we _can_ remove the `min` threshold, and so this change does that. [ci skip-rust] [ci skip-build-wheels]
Pants does not limit the number of files passed to black and isort at the same time, which leads to errors like
Error launching process: Os { code: 7, kind: Other, message: "Argument list too long" }
.Pants version
2.7.0
OS
MacOS
Additional info
We have about 4500 python files in our repo, so when I run a lint across the whole repo at once I get something like
"Run Black on 4434 files.",
.The text was updated successfully, but these errors were encountered: