Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor slow running tests #20128

Closed
BridgeAR opened this issue Apr 18, 2018 · 21 comments
Closed

Refactor slow running tests #20128

BridgeAR opened this issue Apr 18, 2018 · 21 comments
Labels
test Issues and PRs related to the tests.

Comments

@BridgeAR
Copy link
Member

BridgeAR commented Apr 18, 2018

Our test suite takes quite a while currently and we constantly add new tests. A couple of those run really slow and block other tests to run in the meanwhile. So I checked which ones currently take longer than one second on my machine and this is the list:

Path: parallel/test-async-wrap-pop-id-during-load
Path: parallel/test-buffer-constructor-node-modules-paths
Path: parallel/test-buffer-indexof
Path: parallel/test-child-process-exec-encoding
Path: parallel/test-child-process-exec-timeout
Path: parallel/test-child-process-spawnsync-input
Path: parallel/test-cli-eval
Path: parallel/test-cli-eval-event
Path: parallel/test-cli-node-options
Path: parallel/test-cli-node-options-disallowed
Path: parallel/test-cli-node-print-help
Path: parallel/test-cli-syntax
Path: parallel/test-cluster-basic
Path: parallel/test-cluster-bind-privileged-port
Path: parallel/test-cluster-bind-twice
Path: parallel/test-cluster-disconnect
Path: parallel/test-cluster-master-kill
Path: parallel/test-cluster-worker-no-exit
Path: parallel/test-crypto-fips
Path: parallel/test-domain-abort-on-uncaught
Path: parallel/test-domain-with-abort-on-uncaught-exception
Path: parallel/test-env-var-no-warnings
Path: parallel/test-error-reporting
Path: parallel/test-errors-systemerror
Path: parallel/test-eslint-alphabetize-errors
Path: parallel/test-eslint-buffer-constructor
Path: parallel/test-eslint-documented-errors
Path: parallel/test-fs-read-stream-fd-leak
Path: parallel/test-fs-readdir-stack-overflow
Path: parallel/test-fs-readfile
Path: parallel/test-http-full-response
Path: parallel/test-http-pipeline-requests-connection-leak
Path: parallel/test-http-set-timeout-server
Path: parallel/test-http2-server-startup
Path: parallel/test-https-close
Path: parallel/test-internal-module-wrap
Path: parallel/test-module-loading-globalpaths
Path: parallel/test-module-main-fail
Path: parallel/test-module-main-preserve-symlinks-fail
Path: parallel/test-npm-install
Path: parallel/test-pipe-file-to-http
Path: parallel/test-postmortem-metadata
Path: parallel/test-preload
Path: parallel/test-process-argv-0
Path: parallel/test-readline-keys
Path: parallel/test-stdio-pipe-access
Path: parallel/test-timers-unref-active
Path: parallel/test-tls-securepair-leak
Path: parallel/test-trace-events-fs-sync
Path: addons/async-hello-world/test
Path: addons/async-hello-world/test-makecallback
Path: addons/async-hello-world/test-makecallback-uncaught
Path: addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max
Path: addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-base64
Path: addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-binary
Path: addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-hex
Path: addons-napi/test_async/test
Path: addons-napi/test_async/test-async-hooks
Path: addons-napi/test_async/test-uncaught
Path: sequential/test-child-process-pass-fd
Path: sequential/test-debugger-debug-brk
Path: sequential/test-fs-readfile-tostring-fail
Path: sequential/test-inspector-async-hook-setup-at-signal
Path: sequential/test-inspector-enabled
Path: sequential/test-inspector-not-blocked-on-idle
Path: sequential/test-inspector-port-cluster
Path: sequential/test-inspector-port-zero
Path: sequential/test-net-response-size
Path: sequential/test-performance

The sequential ones are particularly bad. A couple of these were just improved by @apapirovski in #20125 ❤️.

I believe we should go through this list and try to refactor a couple of these to reduce the overall load.

Update: I removed all tests where we already know that they should not be changed. This is the case for e.g., all benchmark tests after #20125.

@BridgeAR BridgeAR added test Issues and PRs related to the tests. meta Issues and PRs related to the general management of the project. labels Apr 18, 2018
@refack
Copy link
Contributor

refack commented Apr 18, 2018

Made a checklist is people want to "claim" some.

  • parallel/test-async-wrap-pop-id-during-load
  • parallel/test-buffer-constructor-node-modules-paths
  • parallel/test-buffer-indexof (CPU bound)
  • parallel/test-child-process-exec-encoding
  • parallel/test-child-process-exec-timeout (Needs a one second timeout)
  • parallel/test-child-process-spawnsync-input
  • parallel/test-cli-eval
  • parallel/test-cli-eval-event
  • parallel/test-cli-node-options
  • parallel/test-cli-node-options-disallowed
  • parallel/test-cli-node-print-help
  • parallel/test-cli-syntax
  • parallel/test-cluster-basic
  • parallel/test-cluster-bind-privileged-port
  • parallel/test-cluster-bind-twice
  • parallel/test-cluster-disconnect
  • parallel/test-cluster-master-kill
  • parallel/test-cluster-worker-no-exit
  • parallel/test-crypto-fips
  • parallel/test-domain-abort-on-uncaught
  • parallel/test-domain-with-abort-on-uncaught-exception
  • parallel/test-env-var-no-warnings
  • parallel/test-error-reporting
  • parallel/test-errors-systemerror
  • parallel/test-eslint-alphabetize-errors
  • parallel/test-eslint-buffer-constructor
  • parallel/test-eslint-documented-errors
  • parallel/test-fs-read-stream-fd-leak
  • parallel/test-fs-readdir-stack-overflow
  • parallel/test-fs-readfile
  • parallel/test-http-full-response
  • parallel/test-http-pipeline-requests-connection-leak
  • parallel/test-http-set-timeout-server
  • parallel/test-http2-server-startup
  • parallel/test-https-close
  • parallel/test-internal-module-wrap
  • parallel/test-module-loading-globalpaths
  • parallel/test-module-main-fail
  • parallel/test-module-main-preserve-symlinks-fail
  • parallel/test-npm-install
  • parallel/test-pipe-file-to-http
  • parallel/test-postmortem-metadata
  • parallel/test-preload
  • parallel/test-process-argv-0
  • parallel/test-readline-keys
  • parallel/test-stdio-pipe-access
  • parallel/test-timers-unref-active
  • parallel/test-tls-securepair-leak
  • parallel/test-trace-events-fs-sync
  • addons/async-hello-world/test
  • addons/async-hello-world/test-makecallback
  • addons/async-hello-world/test-makecallback-uncaught
  • addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max
  • addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-base64
  • addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-binary
  • addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-hex
  • addons-napi/test_async/test
  • addons-napi/test_async/test-async-hooks
  • addons-napi/test_async/test-uncaught
  • sequential/test-child-process-pass-fd
  • sequential/test-debugger-debug-brk
  • sequential/test-fs-readfile-tostring-fail
  • sequential/test-inspector-async-hook-setup-at-signal
  • sequential/test-inspector-enabled
  • sequential/test-inspector-not-blocked-on-idle
  • sequential/test-inspector-port-cluster
  • sequential/test-inspector-port-zero
  • sequential/test-net-response-size
  • sequential/test-performance

@apapirovski
Copy link
Member

I can confidently say that

  • sequential/test-http2-timeout-large-write
  • sequential/test-http2-timeout-large-write-file

should not be changed. They test important edge case behaviour.

Also, these can't really be changed any further after my PR from today lands:

  • parallel/test-benchmark-arrays
  • parallel/test-benchmark-assert
  • parallel/test-benchmark-crypto
  • parallel/test-benchmark-dgram
  • parallel/test-benchmark-dns
  • parallel/test-benchmark-domain
  • parallel/test-benchmark-es
  • parallel/test-benchmark-events
  • parallel/test-benchmark-fs
  • parallel/test-benchmark-misc
  • parallel/test-benchmark-module
  • parallel/test-benchmark-os
  • parallel/test-benchmark-path
  • parallel/test-benchmark-process
  • parallel/test-benchmark-querystring
  • parallel/test-benchmark-string_decoder
  • parallel/test-benchmark-timers
  • parallel/test-benchmark-url
  • parallel/test-benchmark-util
  • parallel/test-benchmark-zlib
  • sequential/test-benchmark-buffer
  • sequential/test-benchmark-child-process
  • sequential/test-benchmark-http
  • sequential/test-benchmark-net
  • sequential/test-benchmark-tls

They each run many different benchmark files so there's a Node.js startup cost involved for each additional benchmark that exists.

@BridgeAR
Copy link
Member Author

@apapirovski I removed those from the lists above.

@apapirovski
Copy link
Member

apapirovski commented Apr 18, 2018

The addons tests might also be hard to fix. There's a bootup cost involved and some of those don't do that much.

(Also... sorry that I started immediately poking at it... I really appreciate the fact that you put this together! ❤️)

@BridgeAR BridgeAR added help wanted Issues that need assistance from volunteers or PRs that need help to proceed. good first issue Issues that are suitable for first-time contributors. and removed meta Issues and PRs related to the general management of the project. labels Apr 27, 2018
@shobhitchittora
Copy link
Contributor

@BridgeAR I was looking at some of the tests. Can you share some general things to look at for optimizations ? How to benchmark the test and how to see where are the bottlenecks ?

Thanks! 😇

@tigercosmos
Copy link

Hi, it's my first time come here. If I want to improve one of these tests, are there any document or wiki that I can follow?

@BridgeAR
Copy link
Member Author

@shobhitchittora I did not look at the tests before. It will always be individual for each test case how to improve it. One thing that you could look for are timeouts. Those should likely be minimized while it is important to make sure the test will still cover the same parts as before while also staying reliable.

Besides that it is good to check what takes most time in the individual test. That can be done by using e.g. console.time('foo'). Be aware that this does not log anything if the test passes. So insert a throw new Error() after console.timeEnd('foo') (or better: at the end of the test while the end of the test might be an async operation). You will then see the output of console.time (and of course of the error). However, console.time is a very rudimentary thing and some times it might be better to use the V8 profiler or other tools.

If the test is heavily CPU bound and that is the reason for it to be slow, there is probably little that can be done about it and if that is your finding, please report that back as well. Some times the test could be rewritten in a less CPU bound way while still covering the same parts but if that is not the case, we should just accept that the specific test is going to take the time it takes.

@tigercosmos we have a guide how to write tests. See https://github.com/nodejs/node/tree/master/doc/guides/writing-tests.md. Besides that there is no further documentation. The tests are written just as any regular application code and should be mostly straight forward since it is only Node.js core. There are some helper functions that come from the common test file. These have a own documentation as well: https://github.com/nodejs/node/tree/master/test/common/README.md

@shobhitchittora
Copy link
Contributor

Thanks as always @BridgeAR. 👍🏻

@BridgeAR
Copy link
Member Author

One of the biggest issues in our tests is that we often use sync code instead of running things in parallel.

@BridgeAR BridgeAR mentioned this issue May 12, 2018
4 tasks
BridgeAR added a commit to BridgeAR/node that referenced this issue May 14, 2018
This refactors some tests to reduce the runtime of those.

Refs: nodejs#20128
@Trott
Copy link
Member

Trott commented May 17, 2018

I'm probably in the minority here, but I'm wary of this effort. We seem to be encouraging fairly significant churn just to shave milliseconds here and there off the test suite. Doesn't seem worth it to me. (If the changes have other benefits, like making the code more maintainable, then great. But that doesn't seem to be what happens.)

@BridgeAR
Copy link
Member Author

@Trott there is another reason to do this: this makes sure our tests will have less timeout errors on slow running machines as these errors are still turning up from time to time.

@Trott
Copy link
Member

Trott commented May 17, 2018

@Trott there is another reason to do this: this makes sure our tests will have less timeout errors on slow running machines as these errors are still turning up from time to time.

@BridgeAR In theory, perhaps that's true. In practice, it seems to me that the proposed changes thus far often introduce complexity and decrease consistency. In turn, that makes tests less maintainable and also risks introducing unintentional unreliability. See #20756.

It's likely that a more robust solution for people with slow machines is to run the tests with the --timeout option.

I'm all for test changes that improve reliability of tests and reduce timeouts. This approach (EDIT: specifically, encouraging micro-optimizations) isn't passing the smell test for me. I sense a lot of unintended consequences.

@BridgeAR
Copy link
Member Author

@Trott that highly depends on what test it is and I would not generalize that at all. So far only a few tests were touched at all.

@Trott
Copy link
Member

Trott commented May 17, 2018

@nodejs/testing

@Trott
Copy link
Member

Trott commented May 17, 2018

@Trott that highly depends on what test it is and I would not generalize that at all. So far only a few tests were touched at all.

@BridgeAR I've looked at speeding up the test suite in the past, making a list of tests that are slow on my machine as you've done here. It's possible the general results will be different this time. But it's also hard for me to disregard my own past experiences.

I'm not suggesting that you abandon the effort to make the tests run faster. I just want the pitfalls to be known and for people to watch out for them. And perhaps for a modified approach, specifically a higher threshold than 1 second for listing a test as needing work. Shaving a few milliseconds off a test that takes less than two seconds to run isn't going to help anyone if it's at the expense of consistency and maintainability. People on slow machines who experience timeouts aren't having the problems there. Their problems are with the really long running tests and/or with tests that don't handle system load well. (I would encourage anyone who makes significant changes to any test in parallel to run tools/test.py -j 96 --repeat 192 test/parallel/test-whatever-the-name-is.js before and after the change to see that reliability isn't accidentally reduced.)

There are approaches I would support such as:

  • Getting everything except the global leak checking out of common and into common/* so that not every test has to load everything. I'm not actually sure how much time that would save on the test suite runs but it would be a code improvement regardless, so 👍.

  • The approach here but with a threshold considerably higher than 1 second (as I mention above).

  • It may be worth moving some of the very long-running tests to pummel but I would consider that a last resort as that basically means the test will almost never get run. That would be my concern with moving the benchmark tests out of the main suite, but that's less of a deal-breaker than some other things because it's not the end of the world if a benchmark breaks once in a while and has to be fixed. It's not as if the current benchmark tests are exhaustive anyway. They minimally exercise each benchmark file.

Anyway, those are the things I'd consider. Take it into consideration or not. I'm not trying to stop the effort here. Just trying to get it more focused on places where the cost/benefit ratio is higher in my personal estimation.

shisama pushed a commit to shisama/node that referenced this issue May 18, 2018
this refactors tests for module to reduce runtime

Refs: nodejs#20128
@Trott
Copy link
Member

Trott commented May 19, 2018

Another data point that refactoring tests to shave off 150 ms may create more problems than it solves: #20688 (comment)

BridgeAR added a commit to BridgeAR/node that referenced this issue May 21, 2018
This refactors some tests to reduce the runtime of those.

PR-URL: nodejs#20688
Refs: nodejs#20128
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
@Trott
Copy link
Member

Trott commented May 21, 2018

When "best practices" are not actually the best practices: Replacing *Sync() variants of child_process functions with their asynchronous counterparts is especially dangerous/problematic in parallel tests because it quickly means resource exhaustion can cause tests to fail (which we've seen at least twice now so far in PRs related to this issue).

MylesBorins pushed a commit that referenced this issue May 22, 2018
This refactors some tests to reduce the runtime of those.

PR-URL: #20688
Refs: #20128
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
@Trott
Copy link
Member

Trott commented May 24, 2018

The changes to test/parallel/test-async-wrap-pop-id-during-load.js in #20688 appear to have made the test both unreliable and significantly slower on AIX. Some details at the bottom of that pull request. I'm not sure if this is the test change exposing a bug in core (or AIX) that was there all along or if it's a bug in the test.

I'm removing the good first contribution and help wanted labels from this. These types of refactors are fraught with issues that are hard to foresee if you're not familiar with the test suite. Feel free to re-add them if you disagree with that assessment.

@Trott Trott removed good first issue Issues that are suitable for first-time contributors. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. labels May 24, 2018
MylesBorins pushed a commit that referenced this issue May 29, 2018
This refactors some tests to reduce the runtime of those.

PR-URL: #20688
Refs: #20128
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
jeromecovington added a commit to jeromecovington/node that referenced this issue Jul 14, 2018
First steps towards not needing to require test/common/index.js
everywhere.

Refs: nodejs#20128 (comment)
@fesebuv
Copy link

fesebuv commented Aug 2, 2018

Hello folks,
I just opened a pr that updates test/common/benchmark.js test file in order to reduce the time test-benchmark-buffer test takes to run.
#22090

@Trott
Copy link
Member

Trott commented Aug 2, 2018

Refactoring individual tests is unlikely to have much of an effect on our overall test runtime. I'm not bothered by how long it takes to run our tests, but if others are, here's what I would see as a productive way forward:

  • Make sure all the tests in pummel pass.
  • Work with the Build WG to get pummel tests added to the nightly job that also runs the internet tests.
  • Once pummel tests are run every night in CI, move selected slooooowww tests to pummel.

I'd prefer this issue be closed as I think it seems to attract well-meaning-but-misguided work. Those efforts could be put to better use.

@BridgeAR
Copy link
Member Author

BridgeAR commented Aug 2, 2018

I thought I already closed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

7 participants