-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed tests executed with REv2 hang indefinitely with Bazel 0.25 #8320
Comments
@ulfjack are you aware of this? has this been fixed in master? |
A stack trace would be useful. |
It's odd that this only seems to happen on buildbarn. We don't have any other reports, AFAIK. |
I can't get buildbarn to work:
That's literally all the error message I get. |
While I'm here: it's not a good idea to attempt a network connection in BlazeModule.beforeCommand. |
|
That's with TLS disabled, AFAICT, using all TLS-disabling options I can find. |
Ah, I got it. It was picking up options from my global bazelrc. However, it still doesn't work:
I had to upgrade to a newer version of the bazel_toolchains, as 0.25.2 didn't work with the old one. |
I can reproduce with a shell test. Here's a partial stack trace:
|
Ok, I found it. Sorry about that. |
The fix for this will need a patch release for 0.25 and 0.26 |
The problem is right here: |
As of 4a5e1b7, it was using getErrorPath twice, which could cause it to loop indefinitely, trying to append the error file to itself. This happened rarely, as the test runner script redirects stderr to stdout. However, it could happen if the SpawnRunner wrote any extra output to stderr, which the RemoteSpawnRunner does in some cases. I have manually checked that this fixes the issue, and also added a regression test. Fixes #8320. PiperOrigin-RevId: 249258656
@ulfjack The added test fails on buildkite and locally when cherrypicked into the 0.25 release: https://buildkite.com/bazel/bazel-bazel/builds/8388#b6c12c75-4480-4fe8-93c4-f5ec9a99e4b7 |
@dkelmer Sorry about that. The test relies on the directory being created in the test strategy, which is only true after fb04c5b. You can either ignore the failure, disable the test in the release branch or cherrypick fb04c5b as well. It applies cleanly and also fixes the test failure, but I did not try running all the tests. Technically, we could also patch the test to create the directory before writing to the test.err, but I think disabling the test is fine, too. |
As of 4a5e1b7, it was using getErrorPath twice, which could cause it to loop indefinitely, trying to append the error file to itself. This happened rarely, as the test runner script redirects stderr to stdout. However, it could happen if the SpawnRunner wrote any extra output to stderr, which the RemoteSpawnRunner does in some cases. I have manually checked that this fixes the issue, and also added a regression test. Fixes #8320. PiperOrigin-RevId: 249258656
As of 4a5e1b7, it was using getErrorPath twice, which could cause it to loop indefinitely, trying to append the error file to itself. This happened rarely, as the test runner script redirects stderr to stdout. However, it could happen if the SpawnRunner wrote any extra output to stderr, which the RemoteSpawnRunner does in some cases. I have manually checked that this fixes the issue, and also added a regression test. Fixes #8320. PiperOrigin-RevId: 249258656
As of 4a5e1b7, it was using getErrorPath twice, which could cause it to loop indefinitely, trying to append the error file to itself. This happened rarely, as the test runner script redirects stderr to stdout. However, it could happen if the SpawnRunner wrote any extra output to stderr, which the RemoteSpawnRunner does in some cases. I have manually checked that this fixes the issue, and also added a regression test. Fixes #8320. PiperOrigin-RevId: 249258656
As of 4a5e1b7, it was using getErrorPath twice, which could cause it to loop indefinitely, trying to append the error file to itself. This happened rarely, as the test runner script redirects stderr to stdout. However, it could happen if the SpawnRunner wrote any extra output to stderr, which the RemoteSpawnRunner does in some cases. I have manually checked that this fixes the issue, and also added a regression test. Fixes #8320. PiperOrigin-RevId: 249258656
As of 4a5e1b7, it was using getErrorPath twice, which could cause it to loop indefinitely, trying to append the error file to itself. This happened rarely, as the test runner script redirects stderr to stdout. However, it could happen if the SpawnRunner wrote any extra output to stderr, which the RemoteSpawnRunner does in some cases. I have manually checked that this fixes the issue, and also added a regression test. Fixes bazelbuild#8320. PiperOrigin-RevId: 249258656
Description of the problem / feature request:
Bazel 0.25 has introduced a regression, causing the reporting of failed
tests executed using REv2 to hang indefinitely.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Launch a recent version of Buildbarn.
On a Linux-based system, this can for example be done by using
the Docker Compose based setup.
Create a simple Bazel workspace that contains the following files:
WORKSPACE: May remain empty.
BUILD.bazel:
cc_failure_test.c:
py_failure_test.py:
.bazelrc:
Now run a
bazel test ...
to execute both unit tests. Buildbarn willimmediately execute all build actions, returning that the execution of
both tests have failed. Logging/tracing on Buildbarn's side confirms
this. Still, Bazel will remain stuck in a state where it won't
process the results it receives from the build cluster:
The invocation of Bazel never terminates, regardless of any timeouts
configured. Bazel will not voluntarily shut down when pressing Ctrl+C.
It only terminates after pressing Ctrl+C three times.
This regression has been introduced by commit
4a5e1b7. It only affects the execution
of failing tests; tests that succeed do complete properly.
Below is a list of Bazel versions tested.
doesn't work
895c43d
3bd9a10
0ebc034
4a5e1b7 reverted: works
What operating system are you running Bazel on?
macOS 10.14.x and Ubuntu 18.04.
Have you found anything relevant by searching the web?
Apart from the issues linked to from within the messages of the commits
above (e.g., #6394), nothing relevant has been found.
The text was updated successfully, but these errors were encountered: