Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe performance regression when reporter with terminal output is enabled #2602

Closed
6 tasks done
adam-te opened this issue Jan 3, 2023 · 14 comments · Fixed by #3006
Closed
6 tasks done

Severe performance regression when reporter with terminal output is enabled #2602

adam-te opened this issue Jan 3, 2023 · 14 comments · Fixed by #3006
Labels
help wanted Extra attention is needed

Comments

@adam-te
Copy link

adam-te commented Jan 3, 2023

Describe the bug

Problem:
Full test suite performance is 10x slower when enabling a reporter with terminal output (e.g. default, verbose, dot).

Details:
In converting our unit tests (~8000) from karma to vitest I noticed performance of vitest was significantly worse than karma. After a lot of debugging, I came to find that the wrapAnsi( function call, used in the vitest log-update dependency, was the culprit. This call alone takes ~95% of test execution time.

https://github.com/sindresorhus/log-update/blob/main/index.js#L54

image

Workaround:
Tests run 10-20x faster if the wrapAnsi line in the vitest/dist/chunk-snapshot-manager.js file is commented out. Console output continues to look normal and functional with ansi wrapping disabled

Vitest entrypoint into log-update dependency:
https://github.com/vitest-dev/vitest/blob/main/packages/vitest/src/node/logger.ts#L13

image

Thoughts:
Its bizarre that terminal output logic is reducing test performance by an order of magnitude. Ideally the performance would be optimized here, but perhaps terminal logic can be moved to a worker/sub-process?

Reproduction

TODO: Will add soon. Should be able to reproduce on any large test suite via enabling/disabling default or dot reporter though.

System Info

System:
    OS: macOS 13.0.1
    CPU: (12) x64 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
    Memory: 72.25 MB / 16.00 GB
    Shell: 5.8.1 - /bin/zsh
  Binaries:
    Node: 18.7.0 - ~/.nvm/versions/node/v18.7.0/bin/node
    npm: 8.15.0 - ~/.nvm/versions/node/v18.7.0/bin/npm
  Browsers:
    Brave Browser: 108.1.46.140
    Chrome: 108.0.5359.124
    Firefox: 102.3.0
    Safari: 16.1
  npmPackages:
    vite: ^4.0.0 => 4.0.0 
    vitest: ^0.26.3 => 0.26.3

Used Package Manager

npm

Validations

@adam-arthur
Copy link

adam-arthur commented Jan 3, 2023

Reproduction:
https://github.com/adam-arthur/vitest-perf-regression

I'm able to reproduce this on a different computer (M1) with a barebones project setup. Enabling console reporters seems to introduce some sort of O(n^2) or worse performance profile scaling with the number of tests. Surprised this hasn't been noted/filed before, the degradation is quite stark and seems to be a showstopper above a few thousand tests.

Will edit the initial issue on my other account (adam-te) later today.

@adam-arthur
Copy link

adam-arthur commented Jan 4, 2023

Looked into this a bit more.

  • All console renderers are affected
  • DotRenderer updates on a 200ms setInterval (link)
  • DotRenderer will always iterate over all test cases for each render tick. So moreTests = moreRenderTime (link)
  • The more tests you have, the longer the console render takes, which triggers something of a "doom loop" where the test runtime becomes dominated by the console render time. e.g. if console render takes 190ms, there are only 10ms remaining for tests to run before the console re-renders.
  • The chalk/wrap-ansi code appears to be extremely performance intensive, doing a lot of transforms and iterations in JS

My take:

  • Do we need to fully iterate all tests and re-render the console output on each tick? (vs batching as tests execute)
  • Ideally a different and more performant off the shelf rendering library or approach could be swapped in
  • Alternatively, console positioning logic should be done in a worker or separate execution context to avoid affecting the test execution thread. This is probably the most obvious thing to do
  • Renderer setInterval invocation should be delayed by the render time, e.g. variable rendering when rendering is sufficiently slow
  • Browsing the wrap-ansi code, looks like there's a lot of low hanging fruit for optimization. The code does not appear to be written with performance in mind.

@sheremet-va
Copy link
Member

Alternatively, console positioning logic should be done in a worker or separate execution context to avoid affecting the test execution thread. This is probably the most obvious thing to do

Reporters are running in the main thread, while tests are running in separate workers.

@adam-arthur
Copy link

adam-arthur commented Jan 4, 2023

Alternatively, console positioning logic should be done in a worker or separate execution context to avoid affecting the test execution thread. This is probably the most obvious thing to do

Reporters are running in the main thread, while tests are running in separate workers.

Well that's odd, because the tests do not appear to continue to run unimpeded when console output is sufficiently slow. Test execution slows down commensurately with console rendering time.

You can replicate the problem easily with my repo above and observe for yourself (at least on macOS with default shell)

Any comment or opinion on the issue generally?

@sheremet-va
Copy link
Member

sheremet-va commented Jan 4, 2023

Any comment or opinion on the issue generally?

Does test execution actually slows down, or does reporters shows them like they are slowed down? It's possible RPC call is awaited somewhere, when console log is happening, so tests are slowed down, but I'm not sure if this is actually the problem.

@adam-arthur
Copy link

adam-arthur commented Jan 4, 2023

Well you can clone my repo above and replicate it within 5m. Test execution slows down, yes, and pretty much completely broken performance wise after 5000 tests or so.

Running 1000 tests with expect(true).toBe(true) is close to instant. Running 5000 tests with the same takes 30s or more. It's not a linear scaling at all, there appears to be some O(n^2) or worse slowdown corresponding with the number of tests

This is not a minor performance issue, vitest is unusable beyond a few thousand tests with these reporters.

You can tweak the number of generated tests in the repo by changing a variable to check the scaling behavior

@sheremet-va sheremet-va added bug help wanted Extra attention is needed labels Jan 5, 2023
@sheremet-va
Copy link
Member

with these reporters.

With your reproduction I don't see significant performance regression in verbose and default reporter, only dot is unbearably slow. Can you confirm this?

@adam-te
Copy link
Author

adam-te commented Jan 14, 2023

It affects all of the reporters that act to clear the terminal/contain layout logic. I'll get numbers for it soon, but surprised you didn't observe this.

The extent to which performance degrades depends on the specific reporter. As stated in other comments, the CI reporter which seems to only append/not clear is order of magnitudes faster for large number of tests.

If I had the time to focus on this I would:

  • Heavily optimize this code chalk/wrap-ansi. At a glance it appears to be written extremely inefficiently. Tons of unnecessary array re-allocations and looping
  • Find whatever is blocking test execution on console logic. These two do not appear to be decoupled despite tests running in a worker. Probably an await or similar somewhere as you mentioned

@adam-arthur
Copy link

adam-arthur commented Jan 15, 2023

Pushed an update to: https://github.com/adam-arthur/vitest-perf-regression

You can now observe that verbose reporter slows down significantly too. The exact slowness of the chosen reporter depends on the structure of the log messages, as slowness is related to layout logic that inspects terminal lines as strings.

So just tweaked the expects to trigger slowness in verbose as well. The other reporters seem fine, though I recall even vitest UI being slow on my work testcases... may have been running it along with a slower console reporter though. Will investigate that later to see if can reproduce

@adam-arthur
Copy link

@adam-arthur
Copy link

See: chalk/wrap-ansi#51 for fix on that side.

@AriPerkkio
Copy link
Member

In addition to chalk/wrap-ansi#51 some optimization on Vitest side is needed. When there are lots of test cases the renderTree will become slow:

function update() {
log(renderTree(tasks, options))
}

The renderTree is called constantly for task updates and each call can take most of the process execution time:

Video of slow renderTree calls
render-tree-slow.mov

By limiting the renderTree to process only the tasks that can fit the current screen the run will execute quickly:

- renderTree(tasks, options)
+ const rows = process.stdout.rows || 80
+ renderTree(tasks.slice(tasks.length - rows), options)
Video of fast reduced renderTree calls
render-tree-limit-tasks.mov

But if we still need to print all tasks to terminal on each log, we can limit the amount of work done by renderTree by figuring out what tasks have changed. Maybe some kind of lastChanged: number property on each task which can be used when deciding whether a cached text for given task could be printed instead of processing it in renderTree again? That would prevent duplicate work.

Also maybe the update() should not be called here and instead let the setInterval call it on every 200ms?

update(_tasks: Task[]) {
tasks = _tasks
update()
return this
},

Changes for showing the execution time of renderTree in listRenderer.ts
export const createListRenderer = (_tasks: Task[], options: ListRendererOptions) => {
  let tasks = _tasks
  let timer: any

  const log = options.logger.logUpdate
  let timeSpentOnUpdate = 0n
  let lastLog = 0n
  let lastRenderTree = 0n

  function update() {
    const rows = process.stdout.rows || 80
    const start = process.hrtime.bigint()

    const tree = renderTree(
      // tasks,
      tasks.slice(tasks.length - rows),
      options,
    )
    lastRenderTree = (process.hrtime.bigint() - start) / 1_000_000n

    log(
      tree,
      '\nTotal time spent on update() calls:', timeSpentOnUpdate.toString(),
      'ms\nLast update() took:', lastLog.toString(),
      'ms\nLast renderTree() took:', lastRenderTree.toString(),
      'ms\nCount of tasks:', tasks.length.toString(),
    )

    lastLog = (process.hrtime.bigint() - start) / 1_000_000n
    timeSpentOnUpdate += lastLog
  }

  return {

@sheremet-va
Copy link
Member

By limiting the renderTree to process only the tasks that can fit the current screen the run will execute quickly:

Not sure if this is the right solution, but the idea is nice. I don't think we can slice it before passing it down, because it's possible to have more than 1 line for a task. But maybe an early return from a for would be enough, when we reach the limit?

@AriPerkkio
Copy link
Member

because it's possible to have more than 1 line for a task. But maybe an early return from a for would be enough, when we reach the limit?

Using early break from the for loop looks like more appropriate solution here. After quick testing it seems to preserve output correctly and executes test run quickly - in 6.5 seconds which seems nice compared to the 70 second reference used in #579 (comment).

I'll do some more manual testing and see if the renderTree itself could be optimized as well. With 5000 test cases the final test result printing in stop() still takes ~1200ms as there we will want all results to be printed no matter what process.stdout.rows is.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants