-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Lint files in parallel if many files exist #42
Conversation
If this happens implicitly, how will it affect cross-file linting and caching like eslint-plugin-import uses? |
The cache will be created in each worker thread individually. It's not ideal because of an overhead, but it should work (each worker should lint over a hundred files with using the cache). |
And the worker threads are born and died in a |
In the case of the import plugin on large codebases, we’re talking about like a 30s overhead minimum. I do not think multithreading should ever be implicit or the default, at least not until long after there’s been a cross-thread “caching” stage that plugins can migrate to use so they can insulate themselves from parralelization. |
Doesn't multithreading reduce time over 30s? In my environment, it has reduced over 10s with a little 697 files. I guess it reduces more time on a large codebase. |
I'm saying that the caching alone can easily take 30s. Parallelizing the linting (without an explicit pre-lint caching stage for plugins) does not speed up the overall run, it just makes that 30s happen more than once, which taxes machine resources, likely slowing down the overall run. |
If your environment is single-core, don't worry, the default |
The number of cores isn’t the issue; the number of other things my computer is doing is the issue. |
So... do you mean that don't want ESLint to use CPU 100% by default to lint? |
Even if caching stage increases 30s, but if other stages reduce over 30s, then total time is reduced. It's reasonable. However, it needs more CPU and memory resource than single-thread. This is a trade-off. This RFC is that ESLint uses more CPU resource if many files exist. Indeed, using 100% may be overkill. |
The tradeoff is fine. I’m asking to not force a different tradeoff by default until there’s been time for plugins to migrate to a semantic api that makes the tradeoff explicit and makes it possible to avoid duplicate work and race conditions when linting in parallel. |
I don't think duplicate work is a big problem because it's about efficiency. The result that it reduces time does not change. We can consider adding APIs for more efficiency later. Are there race condition problems on plugins? |
There can always be race conditions when formerly synchronous things become suddenly async. As for it reducing time, parallelization definitely will not always be faster (although for most cases it will be). However, if the change I’m suggesting happens first, then the switch to async by default can be made safe instead of just “use this command line flag to unbreak your lint run” |
@mysticatea Just as a background, the issue @ljharb is talking about is that eslint-plugin-import uses some trickery to build a list of all modules so that it can verify that you are only importing existing files. That is done once (I think on the first file linted), and then cached and provided to all of the rest of the files. This is not really intentionally supported functionality by ESLint, but plugin-import relies on it heavily. |
But
I know that I measured TypeScript repo that is using both
Since the number of files is 366, the estimated proper concurrency was 3. On my PC, it made 35% faster (23200ms → 14810ms) without breakage. |
@mysticatea you're assuming that the only changes an eslint rule makes are a |
Also, sharding the files doesn't fix anything, since the entire codebase must be parsed for any eslint-plugin-import rules that use ExportMap. |
Ah, I got it. Indeed, writing the same file is race condition. Thank you!
|
Thank you for the feedback. I have updated this RFC as separating to step 1 and step 2.
Also, I updated the "Drawbacks" section and the "Backwards Compatibility Analysis" section. |
I would still strongly recommend Step 0.5 as, release a minor version of eslint that adds an explicit stage for plugins to run code that must be in a single thread, prior to all linting beginning - that way eslint-plugin-import can use that stage, and avoid these race conditions entirely. |
I think that it should be another RFC because (1) it's an enough rare case, (2) probably it will be large spec, and (3) honestly I have not understood what kind of API can help your case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mysticatea that a step 0.5 as suggested by @ljharb would probably be better of as another RFC.
@mysticatea Have you considered whether a worker approach like this could maybe have some positive security impacts? Can one limit what these workers are allowed to do and that way maybe limit the possible damage a compromised eslint-plugin could do? I know it's not the purpose of this RFC, but it could be an interesting additional reason to push for an approach like this.
|
||
This change needs a major release. | ||
|
||
ESLint gets using worker threads by default if target files are many. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the fact that in some cases the linting will happen in the main thread and in other in worker threads create a scenario where some plugins work well in just one of the two cases?
To mitigate that, would it maybe be preferable to always use workers in Step 2, even when just a single worker is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because worker_threads
got stable recently, this RFC uses child_process
if Node.js is older. Spawning child processes spent time (about 1 sec) on Windows. Anyway, ESLint has to find config files, parse it, and verify it in all of every worker and the main thread, and the cost is not small.
Therefore, I don't think good if ESLint always uses workers.
can we split the RFC to 2:
this is as the same strategy as |
which issues are you referring to? |
there are mainly 2 issues mentioned in the discussion:
maybe there are more, but none of these is a blocker - users can choose to not use the option, and it's the default. :) |
Ultimately it's up to you all as maintainers, but personally I think you need to at least have an early plan of attack for how you might solve these issues before moving on this RFC. The reason I think this is because |
Better to not even offer the option, than to inflict frustration on a significant percentage of eslint’s users when they attempt to use the option. |
I'm 👍 to having the option - we cannot take care of all the cases, given eslint is how widely used in the community. we can get feedback from early adopters, to help us improve the feature. maybe at some point, it can be the default option. |
All the cases need not be taken care of - but it seems prudent to first take care of the one case that we all already know about, that arguably constitutes the majority of eslint usage, to provide time for package authors to adapt to the new solution, and also time for users to update to it. In other words, #42 (comment) :-) |
Also, esprint already exists, if anybody wants to lint files in parallel today (and accept any extra overhead that'll produce if you're using plugins like My $0.02 is that it's probably best for parallel linting to exist in a fork or an alternative CLI, until it's ready for general use. |
TSC Summary: ESLint assumes that each rule and source file can be processed independently. If we were to ship parallel linting without supporting this use case, the duplicated initialization could make parallel linting slower than single-threaded linting with these plugins. The large number of ESLint users who also use one of these plugins would not benefit from parallel linting. This API would need to provide the plugin with the config and list of files to be linted, and parallel linting would need a way to share the result with workers. TSC Question: Are we interested in an RFC for a plugin initialization API? If so, is that functionality a prerequisite for parallel linting? |
I don't think it's a problem for ESLint to remain stateless; but I think adding some form of lifecycle hooks/API to help stateful modules interop with ESLint would be a huge win. I have thought about this and informally put the idea forward a little while back - eslint/eslint#13525. Unfortunately I haven't had the spare time to spend develop the idea enough to write an RFC. |
@btmills can you update this with discussion from last TSC meeting? |
I've opened eslint/eslint#14139 to collect requirements and ideas for a plugin initialization API. @bradzacher, @ljharb, and anyone else who's thought about hooks for plugins to do pre-linting initialization work, please head over there to make sure we're considering all the use cases. Once we've come up with a direction in that issue, we'll open a separate RFC detailing the design for that API. It will be complementary to this RFC and should explicitly address how it will work with parallel linting. We agreed in the TSC meeting to complete a plugin initialization API before finalizing parallel linting. That way, when we release parallel linting, there won't be any snags for |
Will the rule API be |
With all of the internal changes, we are going to need to rethink how to move forward with this, so closing. |
ESLint
Class ReplacingCLIEngine
#40.Summary
This RFC adds the supports for parallel linting by
worker_threads
/child_process
modules.Related Issues
ESLint
Class ReplacingCLIEngine
#40 - New: Moving to Asynchronous APIWhat are differences from RFC 4 and RFC 11?
--concurrency 1
option.