-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestSequencer heuristic using dependency tree file sizes #7553
Conversation
8e2a26a
to
1e0f863
Compare
Wow this is awesome. I’ll take a closer look after the holidays. Thank you for building this, and nice wins! |
1e0f863
to
9e26ade
Compare
fixed tests on Windows (hopefully) |
Codecov Report
@@ Coverage Diff @@
## master #7553 +/- ##
==========================================
+ Coverage 68.27% 68.33% +0.06%
==========================================
Files 252 252
Lines 9682 9703 +21
Branches 6 5 -1
==========================================
+ Hits 6610 6631 +21
Misses 3070 3070
Partials 2 2
Continue to review full report at Codecov.
|
I just realized that the metric I used to measure the gains is garbage. |
9e26ade
to
8ea7d91
Compare
Alright, so the new metric in the gist uses the sum of the difference between the index of each test file in the cold vs warm sequence.
|
What's up with React? Do you know if any test(s) in particular fools the heuristic? |
c2817d8
to
9adb791
Compare
// If a test depends on one of these core modules, | ||
// we assume that the test may be slower because it is an "integration test" | ||
// that spawn child processes, accesses the file system, etc. | ||
const coreModuleWeights = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas on other core modules we could include here?
I just picked some arbitrary modules that came to my mind and they caused quite an improvement, unfortunately running the benchmark on a few projects takes quite some time so I can't try out all the modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https
if http
is worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend this comment with the information on how's 10000
different than 100000
? It's not obvious if the weight is size in bytes or maybe something else?
I'll take a look |
Looks like for the most part the new heuristic overestimates the duration of the I think it's just that ReactDOM is huge (almost all the ReactDOM tests have size >800000 in recursive dependencies) and most of its tests depend on the whole thing, but of course don't use all of it. |
I just took a look at this and I love that it brings real speed ups for some projects, that's awesome. I'm a little bit worried about projects with large amounts of tests and gigantic dependency trees as we'd stat for the file size of the entire repo every single time Jest is invoked. What if we cap the file size calls at one or two levels in? What if we apply some weight to tests with large dependency trees (if a file has more dependencies than 90% (?) of all tests, it is probably slower, etc.)? I want us to be conscious about the time Jest optimizing a test run as it can be counter productive on really large repos. |
Thoughts on my suggestion to stick it in hastefs? We already |
I’m in favor of that as long as it doesn’t slow down the watchman call, but watchman should be able to give us the file size.
…________________________________
From: Simen Bekkhus <notifications@github.com>
Sent: Wednesday, January 2, 2019 17:38
To: facebook/jest
Cc: Christoph Nakazawa; Mention
Subject: Re: [facebook/jest] TestSequencer heuristic using dependency tree file sizes (#7553)
I'm a little bit worried about projects with large amounts of tests and gigantic dependency trees as we'd stat for the file size of the entire repo every single time Jest is invoked.
Thoughts on my suggestion to stick it in hastefs? We already stat as part of jest-haste-map's crawl. Or would that use too much memory? A number is just 64 bits so maybe not
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#7553 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAA0KE7BF8kdAq6LzEXjPbY9-HmN6p4Tks5u_HARgaJpZM4ZiHlz>.
|
Yeah, watchman includes the file size, while the node crawler (and watch) includes |
Alright, I can try to implement @SimenB's suggestion to eliminate the I/O overhead. The easy one: Cache The hard one: Implement a proper algorithm to calculate all the recursive sizes, something like:
|
Let's see where we land with file sizes stored in the haste map first, and then we can optimize things in a separate PR further :) |
opened #7580 for storing file sizes in haste map file metadata |
9adb791
to
95d2c7a
Compare
Rebased on master, integrating the hasteFS file sizes. |
95d2c7a
to
2d93ff4
Compare
@SimenB implemented both of your suggestions, thanks! |
|
||
const fileSize = (path): number => { | ||
const cachedSize = sizes.get(path); | ||
if (cachedSize != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cachedSize != null) { | |
if (sizes.has(path)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is the Flow types - Flow will think the size can be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is the Flow types - Flow will think the size can be null
Not sure why you need cachedSize
at all, sizes
is always defined and you should be good using @SimenB's code (plus removing line 79), no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I mean. (linked issue is for TS, but applies to Flow as well).
The return type of Map<Path, number>.get
is number | void
, not number
, even if you've done a has
check previously.
I'll release a new alpha now and test it internally to check the impact of the size addition to the haste map. I'll come back to you once it's done. |
_fileSizeRecurseDependencies(test: Test, sizes: Map<Path, number>): number { | ||
const {resolver, hasteFS, config} = test.context; | ||
|
||
const fileSize = (path): number => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by passing sizes
to this fn, you could extract it outside of _fileSizeRecurseDependencies
, so it doesn't have to be recreated multiple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it also uses resolver
and hasteFS
, so fileSize
would need the whole test instead of just the path as its argument as well. Not sure if that's worth it?
Edit: Actually the resolver
and hasteFS
themselves as arguments, because fileSize
is also called on non-test paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, so it can be a class method, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted it to a class method, although IMHO it's less readable now.
packages/jest-resolve-dependencies/src/__tests__/dependency_resolver.test.js
Show resolved
Hide resolved
|
||
const fileSize = (path): number => { | ||
const cachedSize = sizes.get(path); | ||
if (cachedSize != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is the Flow types - Flow will think the size can be null
Not sure why you need cachedSize
at all, sizes
is always defined and you should be good using @SimenB's code (plus removing line 79), no?
fe7a4cd
to
350e12a
Compare
@rubennorte do you have any news on this yet? :) |
@jeysal sorry it took me so long. The change in the haste map is good perf-wise. I'd wait for the release of 24 before merging this. We can release it as an alpha to test it and then as a minor if everything looks good. Thanks for working on this! |
Agreed, best to include this in a minor, no reason for it to be in 24.0 👍 |
350e12a
to
07cbcc8
Compare
Rebased on master. |
To avoid this stalling much longer: Given that there were no significant performance issues with the large open source projects using Jest that I tested, shall we merge it and address the potential perf improvements suggested if someone reports a problem with some project? (I'd give it a quick rebase) @SimenB |
Seems like this needs another rebase. I would advise against merging this for now. I trust that perf is good, but we should verify this on the largest codebase using Jest in the world first, otherwise we'll stall Jest releases later. I'd prefer to stall a PR until we get around to testing it instead of stalling a release of Jest when we realize something is causing problems. Hope that makes sense to you. |
07cbcc8
to
2920417
Compare
rebased |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Summary
In https://www.youtube.com/watch?v=3YDiloj8_d0, @cpojer mentioned that the
TestSequencer
heuristics aside from failures/durations from previous runs could use some improvements.This PR improves the existing "file size" heuristic that is used as a fallback when there is no information from previous runs available by calculating the size of the whole dependency tree of a test in order to schedule more complex tests first.
Note that this PR does not touch on the idea of "test priority" based on changed files that is also mentioned in the video.
Test plan
Edit: See comments below for more up-to-date information.
I used this script on a few small & large open source projects that came to my mind that use Jest.
The script diffs the test order without a cache (which is determined by the file size heuristic) against the test order after one full Jest run (which is determined by the test durations) and counts the number of lines that
diff
prints, which serves as a rough approximation of how close we got to the best test order (lower is better)project: master (median of 4 runs) / this PR (median of 4 runs)
redux: 14 / 12
downshift: 27 / 25
recompose: 75.5 / 75
gatsby: 225.5 / 225
react: 370 / 363.5
jest: 598 / 585.5
prettier: 1816 / 1812.5
The difference might not seem like a lot (and tbh I initially hoped for more), but in hindsight I actually think this is pretty good. The larger projects showed quite consistently improved numbers. I hacked together the alternative approaches "counting test/it occurrences" and "number of files in dependency graph" and they showed no significant improvements over
master
at all, so I think this is pretty much as good as you can get with a reasonably simple heuristic.