-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial version of jest-worker #4497
Conversation
@aaronabramov Tests failed on Node 4 because |
@mjesun We transform rest/spread only in function declaration params ( |
packages/jest-parallel/src/index.js
Outdated
import type {Readable} from 'stream'; | ||
|
||
/** | ||
* The Metro Bundler farm is a class that allows you to queue methods across |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Metro Bundler farm/JestFarm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
packages/jest-parallel/src/worker.js
Outdated
}); | ||
|
||
this._child = child; | ||
this._busy = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this needed to be here? AFAICT this could go into the constructor directly (if creating a process was asynchronous then it would make sense, but then you would also have to set busy = true
at the beginning of the initialization and then call _process()
at the end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialize
method can be called multiple times; not only when instantiating JestWorker
, but also if the child process unexpectedly dies. Once the process is restarted, it cannot be busy (whatever was the state of the previous process), so that's why it's set here.
packages/jest-parallel/src/index.js
Outdated
}; | ||
|
||
for (let i = 0; i < options.workers; i++) { | ||
const metroWorker = new JestWorker(workerOptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metro... 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Awesome job! I love how easy is to understand the internal logic! One question: could a user of this farm have sticky workers without having to decide explicitly in which worker gets initially assigned to each call? Let me explain: As a user, I would prefer not to force AFAICT this would not be possible with the current API: This use case would work really well for Metro Bundler, since in general hundreds/thousands of different tasks are processed concurrently on the first load (when building the initial bundle), but only a small amount of concurrent tasks are processed afterwards (incremental builds), which we want to be sticky. |
@rafeca What you described is exactly how Once executed, the string will be tied to the worker that executed it, so the next time you call the task and you return that same string, the task will be executed by the worker that processed your first call. This allows for optimal parallelization of tasks when there is no worker preference, but it also allows to get a sticked worker when needed. I changed the |
Codecov Report
@@ Coverage Diff @@
## master #4497 +/- ##
==========================================
+ Coverage 55.68% 57.31% +1.62%
==========================================
Files 186 194 +8
Lines 6348 6494 +146
Branches 3 3
==========================================
+ Hits 3535 3722 +187
+ Misses 2812 2771 -41
Partials 1 1
Continue to review full report at Codecov.
|
packages/jest-parallel/README.md
Outdated
The only exposed method is a constructor (`JestFarm`) that is initialized by passing an options object. This object can have the following properties: | ||
|
||
|
||
### `exposedMethods: $ReadOnlyArray<string>` (required) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you want to call a module.exports = function() {}
? This is done internally in jest now
I came here to ask about using the babel interop to make transpiled export default
work, but if a default export is not supported, I guess we can just do exposedMethods: ['default']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the way default
exports are exported by Babel is by adding a default
key to the object, so it is technically possible to call them with the current implementation.
That said, standard exports can't be called, so this is a fair point. I will change the default exported object of getApi
from an Object.create(null)
to a bound _makeCall
that calls with default
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of where jest needs the CJS default export (can be changed of course):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the code, so api
is now also a wrapped call :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Any chance of using the babel interop instead of plain require
, or do you think that'd be confusing behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to clarify, is the way you invoke the main function by passing exposedMethods: ['']
or exposedMethods: []
? And call it by getApi()()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm fine implementing the Babel logic to be able to treat Babel-ified modules as if they were a standard one. Regarding the ['']
, it is not necessary to provide it, the default export is always exposed (i.e. the result of getApi()
is always a method). This makes me think that exposedMethods
should then become optional (there is no sense in making []
a required option).
I will also adjust README.md
to add more information about default exports and potentially provide an example on how to use it.
packages/jest-parallel/README.md
Outdated
|
||
### `getWorker: (method: string, ...args: Array<any>) => ?string` (optional) | ||
|
||
Every time a method exposed via the API is called, `getWorker` is also called in order to stick the call into a worker. This is useful for workers that are able to cache the result or part of it. You stick calls to worker by making `getWorker` return the same identifier for all of them. If you do not want to stick the call to any worker, return `null`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!! could we have a simple example of usage with sticky workers in the usage section?
Ohhh nice!! 😃 I misread the Readme, thanks for making it more clear. Apart from that, I would suggest changing the |
packages/jest-parallel/README.md
Outdated
Allow customizing all options passed to `childProcess.fork`. By default, some values are set (`cwd` and `env`), but you can override them and customize the rest. For a list of valid values, check [the Node documentation](https://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options). | ||
|
||
|
||
### `getWorker: (method: string, ...args: Array<any>) => ?string` (optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getWorker
sounds as if this would return a worker, which it does not.
What about getWorkerKey
, or computeWorkerKey
, or computeKey
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, from all given names the one I like the most is getWorkerKey
. It is a getter, and tells stuff about the workers. I will adjust all references to have that new name.
* LICENSE file in the root directory of this source tree. An additional grant | ||
* of patent rights can be found in the PATENTS file in the same directory. | ||
* | ||
* @format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these are needed in this code base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 true! These ones come from a copy of an internal codebase. I will remove them 🙂
@SimenB Updated |
The test for Node 4 broke with a |
there's a "rebuild" button in the top right corner. |
packages/jest-parallel/README.md
Outdated
@@ -0,0 +1,173 @@ | |||
# jest-parallel | |||
|
|||
Module for executing heavy tasks under forked processes in parallel, by providing a modern interface (`Promise` based), minimum overhead, and sticky workers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just say "a Promise
based interface" rather than modern? The word modern only works at one point in time, and will sound dated in a year. Make it timeless.
packages/jest-parallel/README.md
Outdated
|
||
The module works by providing an absolute path of the module to be loaded in all forked processes, as well as the list of methods that can be remotely called. All methods are exposed on the parent process as promises, so they can be `await`'ed. Child (worker) methods can either be synchronous or asynchronous. | ||
|
||
The way sticky workers work is by using the returned string of the `getWorkerKey` method. If the string was used before, the call will be queued to the related worker; if not, it will be executed by the first available worker, then sticked to the worker that executed it; so the next time it will be processed by the same worker. If you have no preference on the worker executing the task, but you have defined a `getWorkerKey` method because you want _some_ of the tasks to be sticked, you can return `null` from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain at first what a sticky worker is meant to be before you explain how to use it? Instead of saying "sticked", can you say "bound"? "it will be executed by the first available worker, then bound to the that same worker for subsequent calls"
packages/jest-parallel/README.md
Outdated
## Install | ||
|
||
```sh | ||
$ npm install --save jest-parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just... no!
packages/jest-parallel/README.md
Outdated
Allow customizing all options passed to `childProcess.fork`. By default, some values are set (`cwd` and `env`), but you can override them and customize the rest. For a list of valid values, check [the Node documentation](https://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options). | ||
|
||
|
||
### `getWorkerKey: (method: string, ...args: Array<any>) => ?string` (optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go with @davidaurelio's suggestion and use computeWorkerKey
because this function is potentially expensive, and it would be good to denote that in the name.
packages/jest-parallel/README.md
Outdated
// Wait a bit. | ||
await sleep(10000); | ||
|
||
// Transform again the same file. Will immediately return because the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Transform the same file again"
@@ -0,0 +1 @@ | |||
node_modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this to the top level gitignore file instead? Thanks!
packages/jest-parallel/package.json
Outdated
@@ -0,0 +1,13 @@ | |||
{ | |||
"name": "jest-parallel", | |||
"version": "20.0.4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update to latest jest version pls.
const pi = require('./pi'); | ||
|
||
module.exports.loadTest = function() { | ||
pi(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you want to test perf accurately, you need to return the value here back to the parent process.
packages/jest-parallel/src/types.js
Outdated
|
||
/* eslint-disable no-unclear-flowtypes */ | ||
|
||
export type JestApi = {[string]: (...any) => Promise<any>}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you drop "Jest" from the types everywhere? I don't really think that adds much.
packages/jest-parallel/src/index.js
Outdated
const aggregatedStderr = mergeStream(); | ||
const workers = []; | ||
|
||
if (!path.isAbsolute(options.workerPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we instead just run "require.resolve" here ourselves? It will throw if the module isn't found, and it allows people to pass my-worker-package
which may be a form of a "plugin API" for workers some day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't. require.resolve
is relative to the file calling it. If we call it ourselves, all paths will always be relative to jest-parallel
itself, which is undesirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call require.resolve anyway as it'll work with packages (like 'my-worker-package') and throw otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would https://github.com/sindresorhus/resolve-from help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, I think that's going too far. It should either be a pre-resolved module or a node_module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve-from
would work, but you need to go one level up the stack to check who called into jest-parallel
. I really dislike that approach.
packages/jest-parallel/src/index.js
Outdated
return this._api; | ||
} | ||
|
||
getAggregatedStdout(): Readable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just getStdout
is sufficient as a name for this, no? It's implied that it is the aggregated data.
packages/jest-parallel/src/index.js
Outdated
_aggregagedStderr: Readable; | ||
_api: JestApi; | ||
_ending: boolean; | ||
_hashes: {[string]: JestWorker}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to "_keys" or "_cacheKeys"
packages/jest-worker/src/worker.js
Outdated
} | ||
} | ||
|
||
module.exports = Worker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export default
packages/jest-worker/package.json
Outdated
"type": "git", | ||
"url": "https://github.com/facebook/jest.git" | ||
}, | ||
"license": "BSD-3-Clause", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MIT. All headers as well
Oh yeah, we switched from BSD+Patents to MIT license since your PTO. |
When a worker fails (OOM, the process exits, throws...), a new worker is spawned to cover the empty gap left by that worker that died. Calls are not removed from the queue (and not marked as processed) up until we get a result back, so as soon as the new worker is up, the last call will be passed into it again. |
What happens if the same call to the worker throws every time? |
A call to the worker throwing does not mean the process exits; so nothing happens. If the process is consistently dying, it is re-spawned all the time; but that's a failure on the |
Sounds like it should only retry a set amount of times before throwing, so we avoid it retrying forever and making the process seemingly hang |
Exactly, |
I'm not very sure about that. If we have workers OOMing and we don't care about that, after three OOMs we would kill the worker without further notice, and you're trapped. Adding an option as well to allow the amount of times this can happen would complicate the API. |
@mjesun I'm not sure I follow. With the code as you explained it, it'll cause an infinite loop of the child crashing and the parent retrying the same thing. Keep in mind we don't have control over the code that is run in the worker (for example with Jest we run user code in tests in child processes). Right now, if one test crashes the process, we retry twice (could be an intermittent failure) and then worker-farm bails and says it cannot retry. What will the behavior be with |
What I mean is that when a child crashes is not because a method called threw; it is because something, either on the |
Let’s say you have a Jest Test that accidentally calls “process.exit()” or dies in some other way. With jest-worker, how would you handle such a test? Worker farm will retry and eventually crash Jest.
…________________________________
From: Miguel Jiménez Esún <notifications@github.com>
Sent: Tuesday, October 3, 2017 5:50:05 PM
To: facebook/jest
Cc: Christoph Nakazawa; Mention
Subject: Re: [facebook/jest] Initial version of jest-worker (#4497)
What I mean is that when a child crashes is not because a method called threw; it is because something, either on the jest-worker side, or on the Node internals, made it crash. In that case, I think it's ok to re-spawn the thread. If the thread keeps dying, it's almost certainly because of a bug on jest-worker and not on the child module itself.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#4497 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAA0KEgWrmuUf33NFjN7KisfU_N36QtOks5somW9gaJpZM4PaMqp>.
|
packages/jest-worker/src/child.js
Outdated
process.send([ | ||
PARENT_MESSAGE_ERROR, | ||
error.constructor.name, | ||
error.constructor && error.constructor.name, | ||
error.message, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when a string is thrown? throw 'Foo'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a commit to allow this.
Let's ship it. |
Awesome work! For your info, after FAIL packages/jest-worker/src/__tests__/child.test.js
● lazily requires the file
TypeError: Cannot read property 'concat' of undefined
at handle (node_modules/worker-farm/lib/child/index.js:44:24)
at emitOne (events.js:120:20)
at process.emit (events.js:210:7)
at Promise (<anonymous>) MacOS 10.12.6 However |
Interesting. It might be related to the fact that I also wonder why CI environments, which are always starting from a clean environment, did not catch it. I will try to reproduce locally and find a fix if it reproduces. Thanks for noticing! |
There is vague memory of earlier discussion about number of threads testing locally versus CI? EDIT: |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This PR introduces a new module,
jest-worker
, intended to allow heavy task parallelization over multiple workers.The module has a few advantages over the currently one used both in
jest
andmetro-bundler
:100%
flow
-ified.100% test coverage on it, all statements, methods and branches.
Slightly faster than the currenly used one.
Natively provides a
Promise
based interface, which allow us to avoid the extra wrapping layer in order to be used withasync
/`await.It only has one single dependency (
merge-stream
), which we could also remove.Lazily instantiated code in worker, meaning no code is loaded on child processes until the first call is done (lazy require). This allows to spawn a farm with minimal RAM consumption, and only load them when needed.
Sticky workers: tasks will be processed by the first available worker if no stickyness is needed, or by a particular one if the task is forced to do so. Specially useful for workers implementing caches.
Performance test
It can be run by doing
node --expose-gc test.js
under__performance_tests__
. Note that the percentage improvement shown (~ 10%) applies to 10,000 calls, meaning the performance improvement per single call is negligible. The test implements aPromise
wrapper over the current implementation, so we can equivalently test both implementations as we use them in real scenarios.Coverage