Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Loading Test Suite in Chrome 75+ on DDC in Docker Container #1043

Closed
todbachman-wf opened this issue Jun 21, 2019 · 22 comments · Fixed by #1069
Closed

Timeout Loading Test Suite in Chrome 75+ on DDC in Docker Container #1043

todbachman-wf opened this issue Jun 21, 2019 · 22 comments · Fixed by #1069
Assignees

Comments

@todbachman-wf
Copy link

Some of our larger internal Dart projects started hitting timeouts recently when running unit tests in CI. It seemed to start happening following the Chrome 75 release. None of the tests run, instead we get a timeout error before the first test suite loads, which looks similar to:

00:00 +0: compiling test/big_test.dart
00:00 +0 -1: compiling test/big_test.dart [E]
  TimeoutException after 0:12:00.000000: Test timed out after 12 minutes.
  package:test_api/src/backend/invoker.dart 332:28              Invoker._handleError.<fn>
  dart:async/zone.dart 1120:38                                  _rootRun
  dart:async/zone.dart 1021:19                                  _CustomZone.run
  package:test_api/src/backend/invoker.dart 330:10              Invoker._handleError
  package:test_api/src/backend/invoker.dart 283:9               Invoker.heartbeat.<fn>.<fn>
  dart:async/zone.dart 1120:38                                  _rootRun
  dart:async/zone.dart 1021:19                                  _CustomZone.run
  package:test_api/src/backend/invoker.dart 281:38              Invoker.heartbeat.<fn>
  package:stack_trace/src/stack_zone_specification.dart 209:15  StackZoneSpecification._run
  package:stack_trace/src/stack_zone_specification.dart 119:48  StackZoneSpecification._registerCallback.<fn>
  dart:async/zone.dart 1124:13                                  _rootRun
  dart:async/zone.dart 1021:19                                  _CustomZone.run
  dart:async/zone.dart 947:23                                   _CustomZone.bindCallback.<fn>
  dart:async-patch/timer_patch.dart 21:15                       Timer._createTimer.<fn>
  dart:isolate-patch/timer_impl.dart 382:19                     _Timer._runTimers
  dart:isolate-patch/timer_impl.dart 416:5                      _Timer._handleMessage
  dart:isolate-patch/isolate_patch.dart 171:12                  _RawReceivePortImpl._handleMessage
  ===== asynchronous gap ===========================
  dart:async/zone.dart 1045:19                                  _CustomZone.registerCallback
  dart:async/zone.dart 946:22                                   _CustomZone.bindCallback
  dart:async/zone.dart 1191:21                                  _rootCreateTimer
  dart:async/zone.dart 1088:19                                  _CustomZone.createTimer
  package:test_api/src/backend/invoker.dart 280:34              Invoker.heartbeat
  package:test_api/src/backend/invoker.dart 215:5               Invoker.removeOutstandingCallback
  package:test_api/src/backend/invoker.dart 402:13              Invoker._onRun.<fn>.<fn>.<fn>.<fn>
  ===== asynchronous gap ===========================
  dart:async/zone.dart 1053:19                                  _CustomZone.registerUnaryCallback
  dart:async-patch/async_patch.dart 77:23                       _asyncThenWrapperHelper
  package:test_api/src/backend/invoker.dart                     Invoker._onRun.<fn>.<fn>.<fn>.<fn>
  dart:async/future.dart 176:37                                 new Future.<fn>
  package:stack_trace/src/stack_zone_specification.dart 209:15  StackZoneSpecification._run
  package:stack_trace/src/stack_zone_specification.dart 119:48  StackZoneSpecification._registerCallback.<fn>
  dart:async/zone.dart 1120:38                                  _rootRun
  dart:async/zone.dart 1021:19                                  _CustomZone.run
  dart:async/zone.dart 923:7                                    _CustomZone.runGuarded
  dart:async/zone.dart 963:23                                   _CustomZone.bindCallbackGuarded.<fn>
  package:stack_trace/src/stack_zone_specification.dart 209:15  StackZoneSpecification._run
  package:stack_trace/src/stack_zone_specification.dart 119:48  StackZoneSpecification._registerCallback.<fn>
  dart:async/zone.dart 1124:13                                  _rootRun
  dart:async/zone.dart 1021:19                                  _CustomZone.run
  dart:async/zone.dart 947:23                                   _CustomZone.bindCallback.<fn>
  dart:async-patch/timer_patch.dart 21:15                       Timer._createTimer.<fn>
  dart:isolate-patch/timer_impl.dart 382:19                     _Timer._runTimers
  dart:isolate-patch/timer_impl.dart 416:5                      _Timer._handleMessage
  dart:isolate-patch/isolate_patch.dart 171:12                  _RawReceivePortImpl._handleMessage
  ===== asynchronous gap ===========================
  dart:async/zone.dart 1045:19                                  _CustomZone.registerCallback
  dart:async/zone.dart 962:22                                   _CustomZone.bindCallbackGuarded
  dart:async/timer.dart 52:45                                   new Timer
  dart:async/timer.dart 87:9                                    Timer.run
  dart:async/future.dart 174:11                                 new Future
  package:test_api/src/backend/invoker.dart 399:21              Invoker._onRun.<fn>.<fn>.<fn>

00:00 +0 -1: Some tests failed.

We don't see this when running the tests locally (on a Macbook Pro with Dart 2.3.2). I've run the tests for one of our failing projects using a local Ubuntu VM with Chrome 75 and don't see the issue there either.

Some things we tried initially, none of which resolved the issue:

  • Running the tests on Chrome 76
  • Running the tests on Chrome 77
  • Downgrading the Dart SDK to 2.3.0
  • Downgrading the Dart SDK to 2.2.0
  • Using an Ubuntu base image for our Docker container

We eventually stumbled upon 2 workarounds:

  • Run the tests on dart2js
  • Run the tests on Chromium (version 73)

Since we couldn't reproduce this issue locally, only in CI, or as it turns out in a docker container, it was a bit tricky to debug. Eventually we attached remote Chrome dev tools to the Chrome instance running inside a docker container and watched the network as Chrome was downloading files. When we encountered the timeout, Chrome had clearly just stopped downloading files. So, we were fortunately able to come up with steps to reproduce the issue.

This project will reproduce the issue. It includes a library with 125 entry points (the code in them is identical) and a single unit test that imports and uses all of those entry points. I've included a Dockerfile that uses the google/dart:2.3.2 image as a base and runs the unit test first on Chromium (to demonstrate that it succeeds) and then on Chrome (on which it will fail).

When we run our docker containers in CI they are given 16GB of RAM and up to 50GB with swap memory included. I've been running the docker container in that repro on my machine and have been giving Docker 16GB as well.

Additionally, when I run the tests in that repo on an Ubuntu VM (in VirtualBox) with 16GB of RAM, the tests run just fine.

For what it is worth, initially my repro project had 1000 entry points, all of which were used by the unit test. With those 1000 entry points Chromium 73 was still able to download all the DDC-compiled modules and run the test; Chrome of course still failed. I started reducing the number of entry points until they started passing on Chrome. They started passing somewhere between 100 and 125. So it seems Chromium is able to handle loading at least an order of magnitude more files. I haven't tried to determine Chromium's breaking point.

When I was investigating the threshold for Chrome, I did run the tests in my Ubuntu VM as well. Ubuntu was failing with 1000 entry points. When I reduced the number of entry points to 500 the tests started passing.

At this point my suspicion is there is a bug in Chrome on Linux that has degraded its performance from Chrome 74 to Chrome 75. But I'm not 100% certain of that, particularly since the behavior is quite a bit different between a Linux VM and a Linux docker container. I'm hoping y'all might be able to shed more light!

@robbecker-wf
Copy link

@kevmoo It's worth point out that this is only with Chrome 75 on Linux & DDC. dart2js works. earlier versions of chrome work. Chrome + DDC on Mac work. We're stumped.

@kevmoo
Copy link
Member

kevmoo commented Jun 24, 2019

@kevmoo It's worth point out that this is only with Chrome 75 on Linux & DDC. dart2js works. earlier versions of chrome work. Chrome + DDC on Mac work. We're stumped.

Ugh. Good to know. CC @grouma @vsmenon

@greglittlefield-wf
Copy link
Contributor

greglittlefield-wf commented Jul 18, 2019

Related: #390

@todbachman-wf
Copy link
Author

I don't know if this is related at all or just a coincidence, but it appears we are running into this problem on more packages as we update them to run on Dart 2.4 in CI. 🤷‍♂

@grouma
Copy link
Member

grouma commented Jul 18, 2019

Can you provide the Chrome process logs? I don't remember the exact details but I believe the compiling message is a bit of a misnomer as we also are waiting for the browser to connect during this time. If for some reason there is an issue with loading your application in Chrome the connect signal will never complete and thus we will time out. Often in that case there is a parsing / JS error that can be seen in the Chrome logs.

@todbachman-wf
Copy link
Author

@grouma I assume you're interested in the debugging log generated by passing --enable-logging --v=1 to chrome when it starts up. If so, I'm having trouble generating it. It is created if pub run test ... completes successfully, but it doesn't get created in the case when the test run fails with the 12 minute timeout. Do you know if there's some way I can force it to get written? Or is it just not getting moved/copied to the final log location?

This is how I've tried to go about generating it. I created a new dart_test.yaml in my project with:

override_platforms:
  chrome:
    settings:
      arguments: --enable-logging --v=1

Then in my Dockerfile I added ENV CHROME_LOG_FILE=/build/chrome.log before running the tests. When the tests run to completion, /build/chrome.log exists; when they hit the 12 minute timeout the file doesn't exist.

@grouma
Copy link
Member

grouma commented Jul 23, 2019

Hmm that's odd. I'm not familiar with this configuration as we spin up and capture Chrome logs quite differently internally. One thing you might want to try doing is modifying the test runner to print the stdout and stderr of the underlying browser process:

drainOutput(process.stdout);

@todbachman-wf
Copy link
Author

@grouma I'm curious, have you been able to repro the problem?

@grouma
Copy link
Member

grouma commented Jul 24, 2019

No I have not.

@todbachman-wf
Copy link
Author

@grouma I printed out the stderr and stdout for the Chrome process. Interestingly, there is nothing in the stdout. 🤷‍♂ Attached you'll find a file with the contents of stderr.

chrome_std_err.txt

@grouma
Copy link
Member

grouma commented Aug 5, 2019

Nothing in that file is helpful to me unfortunately. When I've seen load errors in the past, the log would contain a critical error.

@todbachman-wf
Copy link
Author

Is there anything else we can collect to help debug this?

@bunopus
Copy link

bunopus commented Aug 16, 2019

We're experiencing same problem, and actually it's blocker for us to move to latest Dart. How can we help with debug? @grouma

@vsmenon
Copy link
Member

vsmenon commented Aug 16, 2019

Hmm, is there any way to get at the Chrome browser console logs?

About a year ago, some of our larger internal customers would see "net::ERR_INSUFFICIENT_RESOURCES" messages.

@grouma
Copy link
Member

grouma commented Aug 16, 2019

@todbachman-wf Can you try capturing the logs by printing stderr and also passing --enable-logging=stderr --v=1 to Chrome? Hopefully the captured logs will contain more details.

@vsmenon
Copy link
Member

vsmenon commented Aug 16, 2019

If this is a network timeout issue, see:

https://requirejs.org/docs/api.html#config-waitSeconds

E.g., if the following appears before the require script, I think it allows you to override.

<script>var requirejs = { waitSeconds: 0 };</script>

@todbachman-wf
Copy link
Author

@grouma That's what I did when I produced the stderr file previously.

@todbachman-wf
Copy link
Author

I've added an html file next to my dart test file to try to disable the requirejs timeout, but that doesn't seem to have helped. When we were debugging this in a Linux VM previously, Chrome wasn't reporting network timeouts, so this doesn't really surprise me.

https://github.com/todbachman-wf/test_timeout_repro/blob/master/test/big_test.html

@sinegovsky-ivan
Copy link

@todbachman-wf it is a chrome >= 75 issue, can be fixed by adding '--disable-dev-shm-usage' to the chrome args.

related issue: karma-runner/karma-chrome-launcher#198

@todbachman-wf
Copy link
Author

OMG. I confirmed that fixed the issue for us @sinegovsky-ivan . Thanks for identifying the fix.

@natebosch
Copy link
Member

@grouma - should we use this flag by default when we launch headless? I skimmed some of the linked issues but I don't fully understand the implications

@grouma
Copy link
Member

grouma commented Sep 5, 2019

I wasn't familiar with the flag but did a bit of reading. I don't think there are really any consequences of using it. Chrome will basically be writing to /tmp instead of /dev/shm. Some of Google's internal (non-dart) infrastructure uses it. I'll make the change later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants