-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New: added draft for async/parallel proposal #4
Conversation
38b8690
to
d50284a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the RFC! I've left a few initial comments -- I think it would be good if you could go into more detail on what design you're proposing.
Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. | ||
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). | ||
2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this point -- to me it seems like the use of promises and threads is orthogonal given that they don't address the same problem. A Promise is a value representing the result of an asynchronous operation, whereas multithreading is a mechanism to make computation faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the concerns were that
promises
would not be faster per say depending on the implementation.
My naive take (I haven't spent any time looking into this yet) is that if we can figure out a way to have each promise represent not just the linting but also the file I/O operations that this would still be a win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@not-an-aardvark to my understanding, promises are just a fancy way of dealing with callbacks. So, under the hood, they could still be blocking the main thread if the callback that is being invoked is synchronous in the nature (the promise just ends up being a convenience to the consumer, but adds no benefit to them). To @kaicataldo's point, if the promise wraps a call that is non blocking on the main thread (because it too is promise based and async), then that's the best. The thread is a layer I see added on top of an already async system. To be 💯 open, I've not done much work with workers/threads, so I don't know how effective it would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for clarifying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this statement is confusing. The choice isn't really between threads (probably you mean processes in this context) or promises, as these accomplish two different things. At some point in the API there will need be promise support, which allows the JS engine to continue operating while waiting for some result to be returned.
I think it's safe to assume that there will need to be processes that somehow communicate back to the main process, likely via promises.
|
||
Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. | ||
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't think putting parallelism behind a flag should be a requirement provided that compatibility is preserved -- it seems like an implementation detail that users shouldn't have to worry about. (However, based on previous discussions there may be others on the team who disagree with me about this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. I don't really see any value in keeping the ability to do things sync still. That being said, I do think this should go out in a major release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it being in a major release. Plus, anyone who needs sync functionality can peg to an older version and move on their own time. I don't see a value in keeping things synchronous either unless there is a really good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see this put behind some kind of flag for at least one release. My concern is that without a way to opt-in/opt-out, it will be difficult to determine if a certain class of bugs is in any way related to the parallelism or not. I wouldn't be comfortable shipping this as the default behavior until we are clear that we've worked out the bugs and gotten feedback from the community. At that point, I'd still want an opt-out flag to help with debugging.
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). | ||
2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. | ||
3. The experience should be opt in so as not to break existing users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on the need to not break existing users, but I'm not sure being opt-in is a requirement for this (see my comment on the first list item).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I disagree for the reasons stated in my previous comment. :)
|
||
## Detailed Design | ||
|
||
Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good list of points to consider when designing a solution -- I've left some feedback below on the points. However, the purpose of this section (and the RFC process more generally) is to describe a proposed design in significantly more detail, rather than just giving a summary of the problem and the points being considered. (See the design section of this work-in-progress RFC for an example.)
Some questions I have after reading this include:
- What specific changes to the
CLIEngine
API are being proposed? If there are new methods being added, what do they do? - Does this design change the rule API? Before reading this RFC, my expectation for how parallelism would work was that the rule API would remain the same, and parallelism would happen on a per-file basis independently from rules. However, point 4 below mentions that some rules might need to identify themselves as a particular type, and I'm confused about why this would be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the referral. I'll make adjustments. This is my first RFC so I appreciate the feedback 😅 !
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). | ||
2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. | ||
3. The experience should be opt in so as not to break existing users. | ||
4. There should be some sort of mechanism to handle eslint rules that expect `sync` reading. Rules may need to identify themselves as a "type". This way, if a rule is being loaded that requires `sync`, the code can handle it accordingly. Any rule that doesn't identify as a `sync` or `async` would be defaulted to `sync`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any examples that demonstrate why this is necessary? I'm not sure I follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pulling this idea from this comment eslint/eslint#10606 (comment)
I honestly don't know the inner workings of a rule set, so I don't know that rules can necessarily be sync
or async
by nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Currently, those visitor callbacks are synchronous (see eslint/eslint#10606 (comment) and eslint/eslint#10606 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So for now I would say this part of the RFC is either out of scope or unnecessary for the first pass at this idea. Ideally, rules should be independent of execution anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There is no such thing as an async rule, and even if there was, that should be handled by something lower in the API stack. I think it's safe to remove this reference.
|
||
## Documentation | ||
|
||
1. There should be JSDoc code for sure. In code documentation is always the best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently enforce this in the codebase, so this will happen whether it's in this proposal or not!
I also think we'll want to document any changes that occur to any public APIs as well as possibly do a little write-up of how ESLint does async processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spots that I believe would have to be updated would be:
- Node.js API
- Command Line Interface (assuming a flag)
|
||
1. More to maintain for the maintainers | ||
2. Threading and async problems are hard, and can require considerable effort (even with other projects out there doing it). | ||
3. This may expose unintended edge cases for eslint rules that expect synchronous scanning. There will need to be a shim mechanism for sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on my question above, I can't think of any rules that do this off the top of my head. If they do exist, perhaps we should alter them to not be affected by this (and make that a requirement for rules going forward).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Each rule is very self-contained so I don't foresee any drawbacks along those lines.
Some other drawbacks I can see:
- How will this interact with the
--debug
flag? Right now we recommend people use that to help figure out problems, and presumably we would lose some or all of that ability. - Testing the multi-process functionality seems like it would be difficult. Travis uses a 2-core CPU for each VM, so we do have some coverage. It still seems like it would be difficult to determine if this is working automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this started! As @not-an-aardvark mentioned, I think what we're missing here is a design of how this could work. What I'm looking for, specifically, is:
- Which parts of the API will change and in what way?
- What is the mechanism by which the main process will communicate with subprocesses?
- What functionality happens before spinning up subprocesses?
- What is each subprocess responsible for?
- How are results from the various subprocesses combined and presented?
- What happens if a subprocess dies before completing its work?
|
||
## Summary | ||
|
||
`CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the goal of the RFC is broader than this describes. The goal is really to allow ESLint as a whole to lint files in parallel. The changes to CLIEngine
are an implementation detail, and I would expect changes to cli
to also be included. I'd suggest reframing the summary of this RFC to make the high-level goal clear while explaining some of the implementation changes you're envisioning.
|
||
Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. | ||
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see this put behind some kind of flag for at least one release. My concern is that without a way to opt-in/opt-out, it will be difficult to determine if a certain class of bugs is in any way related to the parallelism or not. I wouldn't be comfortable shipping this as the default behavior until we are clear that we've worked out the bugs and gotten feedback from the community. At that point, I'd still want an opt-out flag to help with debugging.
Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. | ||
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). | ||
2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this statement is confusing. The choice isn't really between threads (probably you mean processes in this context) or promises, as these accomplish two different things. At some point in the API there will need be promise support, which allows the JS engine to continue operating while waiting for some result to be returned.
I think it's safe to assume that there will need to be processes that somehow communicate back to the main process, likely via promises.
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). | ||
2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. | ||
3. The experience should be opt in so as not to break existing users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I disagree for the reasons stated in my previous comment. :)
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). | ||
2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. | ||
3. The experience should be opt in so as not to break existing users. | ||
4. There should be some sort of mechanism to handle eslint rules that expect `sync` reading. Rules may need to identify themselves as a "type". This way, if a rule is being loaded that requires `sync`, the code can handle it accordingly. Any rule that doesn't identify as a `sync` or `async` would be defaulted to `sync`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There is no such thing as an async rule, and even if there was, that should be handled by something lower in the API stack. I think it's safe to remove this reference.
|
||
## Documentation | ||
|
||
1. There should be JSDoc code for sure. In code documentation is always the best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spots that I believe would have to be updated would be:
- Node.js API
- Command Line Interface (assuming a flag)
|
||
1. More to maintain for the maintainers | ||
2. Threading and async problems are hard, and can require considerable effort (even with other projects out there doing it). | ||
3. This may expose unintended edge cases for eslint rules that expect synchronous scanning. There will need to be a shim mechanism for sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Each rule is very self-contained so I don't foresee any drawbacks along those lines.
Some other drawbacks I can see:
- How will this interact with the
--debug
flag? Right now we recommend people use that to help figure out problems, and presumably we would lose some or all of that ability. - Testing the multi-process functionality seems like it would be difficult. Travis uses a 2-core CPU for each VM, so we do have some coverage. It still seems like it would be difficult to determine if this is working automatically.
Are you able to implement this RFC on your own? If not, what kind | ||
of help would you need from the team? | ||
--> | ||
I would like help from the core team with this effort as I do work full time and even this has been a few months of work in my free time outside of work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of help are you talking about? Editing the RFC? Implementing the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing. I tend to only have weekends as time when I can work on side projects. Otherwise, I don't mind taking a few months on weekends working on this 😄. It will be a lot of code reviews for sure is all.
I'm going to push some changes I've made, but I've not flushed out all the details yet. Please bear with me while I make those changes 😄 . |
talked to outputFixes and the constructor
outlined use of executeOnFiles
I believe this can be reviewed again. I don't expect this review round to be the end, I'm just looking for feedback to understand if I am approaching the RFC correctly (as previously stated, this is my first). Please let me know if there is anything I can expand on or clarify. |
Thanks David, we will take another look.
And no worries, this process is new for us, as well. You're helping us
figure out if we're doing things well just by volunteering this RFC.
…On Sat, Dec 8, 2018 at 9:16 AM David ***@***.***> wrote:
I believe this can be reviewed again. I don't expect this review round to
be the end, I'm just looking for feedback to understand if I am approaching
the RFC correctly (as previously stated, this is my first). Please let me
know if there is anything I can expand on or clarify.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AACWkhOuxurVZIh9-cWLWdTebv1O7r9Vks5u2_P6gaJpZM4Y9u4y>
.
--
______________________________
Nicholas C. Zakas
@SlickNet
Author, Principles of Object-Oriented JavaScript <http://amzn.to/29Pmfrm>
Author, Understanding ECMAScript 6 <http://amzn.to/29K1mIy>
|
Thanks for the updated content. A few thoughts/questions: I'm not sure if it makes sense to start going through methods and creating asynchronous versions of all synchronous operations. It seems like figuring out which synchronous methods should be converted would be something that comes after a higher-level concept of this would work. By higher-level concept, what I mean is we should first be looking at what happens when someone runs ESLint on the command line before we worry about which APIs to change. For example, this is what currently happens when you run ESLint on the command line:
What I'm interested in is where in this process do we fork off another process to continue work? On the surface, it seems like generating the list of files to lint should work as it does today, and then we should be forking the process such that the forks are responsible for what happens to each file (maybe completely, maybe not). jscodeshift, for example, does something similar. It figures out the list of files to operate on and then forks the process. It then sends batches of 50 files to each process until the entire group of files has been operated on. I think figuring out a high-level description of how this would work will then inform what kind of API changes should be made. |
@nzakas Thanks. I now understand what you mean. Let me do some digging on how others, like |
@nzakas I've done some more reading on the jscodeshitft implementation. Comments are sparse, so I don't know how they picked their
In general, I like the flow of how they do it. I'm not familiar with how something like So with regard to the example you gave, I would make these suggestions:
|
I think that seems like a good high-level description.
The batch size is, most likely, something we'd need to tinker with. I'm
guessing that there is some overhead involved with passing messages between
the parent and child process, maybe just in the form of a slight delay,
such that sending one file at a time to a worker might result in longer
execution time than sending batches. So, I'd put in the RFC that the final
batch size recommendation would have to come as a result of doing some
testing with the implementation.
…On Sat, Dec 22, 2018 at 4:13 PM David ***@***.***> wrote:
@nzakas <https://github.com/nzakas> I've done some more reading on the jscodeshitft
implementation
<https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L224-L227>.
Comments are sparse, so I don't know how they picked their CHUNK_SIZE
variable. However, the general flow is:
1. Get all the files to be processed (via a promise
<https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L216>
)
2. Determine the threshold of processes
<https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L224>
they can have running at once. This is done by looking at the CPU count on
the given system, and taking Math.min of the files relative to the CPU
(which means in most cases this will come out to be 4 assuming a quad core
processor). If someone passes runInBand, it is single threaded by
default.
3. Determine how many files can go to each process
<https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L225>.
Take some arbitrary CHUNK_SIZE (50 in this case) and apply it relative
to how efficiently you can split the number of files across the processes
provided. If we have 200 files for example and 4 processors, the math works
out to ((200/4)/50) = 1.
4. Send the batched files off to a worker that has a predefined set of
functions (in the case of ESLint, this would be the bit you specified).
Based on the messages the workers send
<https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L270-L285>
act accordingly. In the case of ESLint, this gets to the point we mentioned
earlier about keeping track of failures. jscodeshift reports issues to
the console
<https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L66>.
One thing we need to determine is whether or not a failure to lint, or even
write/fix, is grounds for ending the process. I'd prefer if ESLint failed
gracefully and just said "unable to lint/fix files:" and provided output.
5. Once the workers are done, provide output to the user
<https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L292>
.
In general, I like the flow of how they do it. I'm not familiar with how
something like jest or test runners do this (because they too enumerate
files to use). While I did some digging, I gotta say finding well
documented code in some of these repos is a real shot in the dark 😅 .
So with regard to the example you gave, I would make these suggestions:
- Determine list of files to lint
- *Determine processors available*
- *Determine batch size* <-- I'm debating here if it should be a
batch, or if by default the number of CPUs available is the number of files
we can handle simultaneously. Does it make sense to have 50 files go to a
worker only to have those 50 be process synchronously in a thread (assuming
4 workers, this is 200 files divided over 4 processes)? Or does it make
sense to have 4 files go to 4 workers, each processed simultaneously and
keep feeding single files to workers as they become free?
- *Send each batch to a worker*
- For each file *batch sent to a worker*:
- Check the cache to see if the file has changed
- Determine the correct configuration for the file
- Lint the file
- Optionally apply fixes to the file
- Optionally write fixes to the disk
- *Send a message back to the parent process*
- Gather messages
- Output lint results to console
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AACWkvQiD96TAElJWk9z360W_ZY4XhgWks5u7sqngaJpZM4Y9u4y>
.
--
______________________________
Nicholas C. Zakas
@SlickNet
Author, Principles of Object-Oriented JavaScript <http://amzn.to/29Pmfrm>
Author, Understanding ECMAScript 6 <http://amzn.to/29K1mIy>
|
Add detailed outline.
@nzakas Thanks! I've updated the RFC with what I said in that comment, and noted the bit about determining chunk size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, adding the batch information is helpful.
I think we still need more detail on how all of this will work. The goal of an RFC is to provide enough detail such that someone other than the RFC author would be able to reasonably implement the RFC even if the RFC author was unavailable to provide help any context. (It might be helpful to review #3 to see the process and detail we used to merge our first RFC.)
Some open questions in my mind about this proposal are:
- How does the top-down architecture of the CLI app change?
bin/eslint.js
is the entrypoint and then that callslib/cli.js
. It seems likelib/cli.js
'sexecute()
method would need to change to return a promise so that it can call some async function onCLIEngine
. - After the files to lint are gathered, how is that list managed? I'm assuming some sort of a queue, but it would be good to be specific.
- Can this be reasonably implemented in Node.js 6 (without JavaScript
async
functions) or would we need to drop Node.js 6 support becauseasync
functions are the only sane way to do this? - The previously mentioned handling of configuration files and configuration caches needs to be thought out.
- How would all of this work with the
--cache
option? Maybe it would just work without changes, but I think that's something we need to be explicit about.
Thanks again for all of your work and the thought you've put into this. I think this process is really helping to tease out some of the questions we hadn't thought too much about prior.
- _Send each batch to a worker_ | ||
- For each _batch sent to a worker_: | ||
- Check the cache to see if the file has changed | ||
- Determine the correct configuration for the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be one of those hidden tricky steps. Right now, we cache configuration files for faster lookup later. In the case of a worker, does that mean each worker would have its own configuration cache? (I think that would be fine, just something we should think about.)
Also, in determining the batches being sent to workers, should there be any consideration to batching together files in the same directory so each worker doesn't have to recalculate configuration info?
I haven't forgotten about this, I've been a little busy. I'm going to be taking a look at it this weekend hopefully. Just wanted to put that out there so it didn't seem like I ghosted this. |
Thanks for the heads up! And no worries, the RFC process tends to take a
while anyway, so there is no time pressure.
…On Sat, Jan 26, 2019 at 8:22 AM David ***@***.***> wrote:
I haven't forgotten about this, I've been a little busy. I'm going to be
taking a look at it this weekend hopefully. Just wanted to put that out
there so it didn't seem like I ghosted this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AACWkvmIHQGu7CbawP4DtvDTBFsmK8Yhks5vHICxgaJpZM4Y9u4y>
.
--
______________________________
Nicholas C. Zakas
@SlickNet
Author, Principles of Object-Oriented JavaScript <http://amzn.to/29Pmfrm>
Author, Understanding ECMAScript 6 <http://amzn.to/29K1mIy>
|
Just for some discussion before I put it into the RFC (just getting around to this now, I know it's been a while). How does the top-down architecture of the CLI app change? bin/eslint.js is the entrypoint and then that calls lib/cli.js. It seems like lib/cli.js's execute() method would need to change to return a promise so that it can call some async function on CLIEngine.
After the files to lint are gathered, how is that list managed? I'm assuming some sort of a queue, but it would be good to be specific.I believe a FIFO queue is fine for this. As far as I can tell, order should not matter. Can this be reasonably implemented in Node.js 6 (without JavaScript async functions) or would we need to drop Node.js 6 support because async functions are the only sane way to do this?I think if we are looking for backwards compatibility, something like Alternatively, we could publish to a beta channel with the intent to just deprecate Node 6 all together and avoid the interim period. I leave that decision to the maintainers. I would prefer to write things with The previously mentioned handling of configuration files and configuration caches needs to be thought out.I think loading of configuration files and caches, for a first iteration, can remain synchronous. Most of the speed benefit will come from the batch processing of all the files etc as mentioned in the spec. Thus, we can leverage the existing code that exists today. Warm up times would still be a little slow, but that can be improved once the processing changes are solid. How would all of this work with the --cache option? Maybe it would just work without changes, but I think that's something we need to be explicit about.As far as I know, it should remain the same since I believe @nzakas If the above is acceptable, I can update the RFC with it. If you have more questions, we can hash them out in the comments and then I will update the RFC once we have consensus 😄 . EDIT: Do we wish to carry on with this if #11 is in progress? |
async/await is just promises; there's zero reason for eslint to drop anything to use it. Consumers can always choose to use async functions as long as eslint wraps return values in |
@Aghassi Thanks for continuing to work on this. A few notes:
|
@nzakas @not-an-aardvark anything else you think I should add or expand on based on the last push? |
@mysticatea Do you have any feedback? It's been a while and I haven't heard back from the others on what else may be needed for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. Thank you very much for your work on this.
Well, because I had not tracked the discussion about parallel processing, I'm sorry if my suggestion was mentioned in the past.
I feel that this RFC lacks the concrete design for user-facing changes. It's important how this change affects public API.
- What is a new option?
- What is a new API?
- How do those work?
- How do those affects the existing components (rules, plugins, custom parsers, and shareable configs)?
|
||
Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. | ||
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or option when using CLIEngine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need a more concrete design about user-facing change. This is pretty vagueness to implement.
I think that this RFC intends to:
- add
--parallel
option (or something like). And we need to describe how it works. - add
CLIEngine#executeOnFilesInParallel(patterns)
. And we need to describe how it works.
About user-facing changes (public API changes) is very important. The purpose of RFC processes is to clear that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how does the --parallel
option work with other CLI options? Especially, --stdin
, --fix
, --cache
, --debug
.
Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. | ||
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or option when using CLIEngine). | ||
2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should support `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this sentence is ... confusing. Why does this compare Promises and Threads? Promises (and async
) are the API to receive the result of asynchronous processing. Threads are the mechanism to run arbitrary processes concurrently. Those are things at different layers that we cannot compare. (we can use promises to receive the result of the processes that threads ran. It's the thing that other languages such as C# do.)
I guess that you intended to talk about what the way made parallel. One made IO parallel, another one made verification parallel. Then the former was not efficient.
I'm not sure if this sentence is needed.
|
||
1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or option when using CLIEngine). | ||
2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should support `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing. | ||
3. The experience should not break existing users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this sentence is needed.
It should be in the "Backwards Compatibility Analysis" section as:
- How it's possible to break the current behavior or not?
|
||
For this change, we will not need to modify the logic for `.eslintcache`, nor `cli.execute` as they are not exposed. We do need to keep the functionality of `CLIEngine.executeOnFiles` unchanged. | ||
|
||
Since Node 6 is intended to be deprecated in ESLint 6, it is ok for us to rely on and use `async/await` syntax to support this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this sentence is needed.
- Gather messages | ||
- Output lint results to console | ||
|
||
For this change, we will not need to modify the logic for `.eslintcache`, nor `cli.execute` as they are not exposed. We do need to keep the functionality of `CLIEngine.executeOnFiles` unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this sentence is needed.
Instead, we need to describe how it realizes the cache functionality in parallel processing. Especially, the cache is one file. How does it read/write the cache without race condition in parallel?
|
||
## Documentation | ||
|
||
1. There should be JSDoc code for sure. In code documentation is always the best. In particular, the NodeJS API and the CLI Documentation would need to reflect this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this section is to clarify how we describe this change to users. So it should describe how we update our documentation (https://eslint.org/) rather than JSDoc. For example, I think the following notes are needed.
- The Command Line Interface page should describe how new
--parallel
option works. - The CLIEngine section of the "Node.js API" page should describe how new
executeOnFilesInParallel(patterns)
method works. - The Working with Plugins page should describe parallelism for plugins that parse the whole project such as
@typescript-eslint/eslint-plugin
. - The Working with Custom Parsers page should describe parallelism. Especially, if the parser author wants to provide functionality that shares arbitrary state via
parserServices
API, it may not support--parallel
.
- Optionally write fixes to the disk | ||
- _Send a message back to the parent process_ | ||
- Gather messages | ||
- Output lint results to console |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since rulesMeta
, formatters require all rules that the verification process used. How does the formatter know/get the rules that were used in workers?
Updating RFC PR Co-Authored-By: Toru Nagashima <public@mysticatea.dev>
Thanks @mysticatea . Let me look and address your comments. Appreciate the feedback 😄 |
I made a quick and dirty PoC here eslint/eslint#12191 The main idea is to split the files into X chunks that can be linted sperately on X workers, I used Please have a look at the PR and let me know what you think of the general idea UPDATE: I found workaround by passing the ConfigArray instance, then rehydrating the non serializable members in the workers https://github.com/eslint/eslint/pull/12191/files#diff-660ea0590a55a93f96e9f6979144e554R445 |
Thank you all for the discussion in this thread and particularly @Aghassi for putting the RFC together! A lot has changed in the core ESLint APIs as we worked toward making parallel linting possible. The TSC decided today to consolidate the parallel linting work in #42, which @mysticatea is updating to reflect a prototype implementation, so please follow along there. |
Summary
CLIEngine
currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code.Related Issues
PR - eslint/eslint#10606
Google Group RFC Discussion - https://groups.google.com/forum/m/#!topic/eslint/di7l6__w2jk
Bounty Discussion - https://www.bountysource.com/issues/26284182-lint-multiple-files-in-parallel