Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel coverage hangs #21154

Closed
Olsworn opened this issue Jan 31, 2024 · 4 comments
Closed

bazel coverage hangs #21154

Olsworn opened this issue Jan 31, 2024 · 4 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@Olsworn
Copy link

Olsworn commented Jan 31, 2024

Description of the bug:

When running bazel coverage the bazel process occasionally hangs.
We first noticed the issue in our CI that runs with remote execution, but managed to reproduce the problem locally.
The process will hang until stopped manually.
The issue seems to stop happening when we remove the following flag: --combined_report=lcov
The last output in stdout seem to almost always be checking cached actions.

The issue was happening prior to version 7, but we did not attempt to make a reproduction until today.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Here is a minimal reproduction of the issue https://github.com/Olsworn/bazel-coverage-hang

Which operating system are you running Bazel on?

Ubuntu 22.04

What is the output of bazel info release?

release 7.0.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@iancha1992 iancha1992 added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Jan 31, 2024
@oquenchil oquenchil added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Feb 6, 2024
@tjgq
Copy link
Contributor

tjgq commented Feb 8, 2024

@bazel-io fork 7.1.0

bazel-io pushed a commit to bazel-io/bazel that referenced this issue Feb 9, 2024
With Skymeld, the combined report is done right before the end of the build. This works fine except for a buggy case:
- `--nokeep_going`
- when there's an error that the Coverage action transitively depends on
- The error would try to interrupt the build, and since we evaluate coverage actions uninterruptibly (consistent with the noskymeld behavior), we're stuck in an infinite loop [1].

This happened because we did not fail fast like we should have. This CL fixes the issue by checking, before generating the combined report, whether we should fail fast.

Fixes bazelbuild#21154

[1] https://github.com/bazelbuild/bazel/blob/026f493a5a403fa5d770cae0b3a3baf8dcf33488/src/main/java/com/google/devtools/build/lib/concurrent/Uninterruptibles.java#L33-L39

PiperOrigin-RevId: 605572100
Change-Id: I8d57dacca58358771799161352819ec65bef6ac2
@joeleba
Copy link
Member

joeleba commented Feb 9, 2024

@Olsworn thanks for the bug report and the reproduction!

@Olsworn
Copy link
Author

Olsworn commented Feb 9, 2024

@joeleba Thanks for having a look at it!

github-merge-queue bot pushed a commit that referenced this issue Feb 12, 2024
…21271)

With Skymeld, the combined report is done right before the end of the
build. This works fine except for a buggy case:
- `--nokeep_going`
- when there's an error that the Coverage action transitively depends on
- The error would try to interrupt the build, and since we evaluate
coverage actions uninterruptibly (consistent with the noskymeld
behavior), we're stuck in an infinite loop [1].

This happened because we did not fail fast like we should have. This CL
fixes the issue by checking, before generating the combined report,
whether we should fail fast.

Fixes #21154

[1]
https://github.com/bazelbuild/bazel/blob/026f493a5a403fa5d770cae0b3a3baf8dcf33488/src/main/java/com/google/devtools/build/lib/concurrent/Uninterruptibles.java#L33-L39

Commit
37247d5

PiperOrigin-RevId: 605572100
Change-Id: I8d57dacca58358771799161352819ec65bef6ac2

Co-authored-by: Googler <leba@google.com>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

7 participants