Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build hangs in com.google.devtools.build.lib.remote.RemoteExecutionCache ensureInputsPresent #16445

Closed
dpoluyanov opened this issue Oct 10, 2022 · 3 comments
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged

Comments

@dpoluyanov
Copy link

Description of the bug:

We've discovered some builds, hanging on "Compiling Java headers" for indefinite state in remote execution. Hacking around and passing some kill -3 commands to bazel in our runner I've discovered that there are two skyframe-evaluator threads sitting in blockingAwait() in com.google.devtools.build.lib.remote.RemoteExecutionCache.ensureInputsPresent(RemoteExecutionCache.java:115) (I've attached a full thread dump below).

It is always Compiling Java headers stuck, and always two skyframe-evaluator threads sitting on this line (e.g. skyframe-evaluator 357 and skyframe-evaluator 490 in attached thread dump.

I still in doubt if it is caused by some infrastructure failure, by problem with remote executor (tested on buildbarn) or by some of tons of flags which we are using in our build configuration.

I've tried to remove those flags, to enable build profile, to disable build profile, still don't luck.

Couldn't insist that it is some kind of race condition in RemoteExecutionCache, but "looks like it is".

We are using 6.0.0-pre.20220922.1, and as for now could not downgrade to test if such behaviour present in lower versions.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

linux

What is the output of bazel info release?

release 6.0.0-pre.20220922.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

There are no similar issues on these resources

Any other information, logs, or outputs that you want to share?

bazel-server.jvm.threaddump.txt

@sgowroji sgowroji added type: bug more data needed untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Oct 11, 2022
@sgowroji
Copy link
Member

Hello @dpoluyanov, Could you please provide minimal steps to reproduce the issue with sample code repo. Thanks!

@dpoluyanov
Copy link
Author

Probably, it's not as easy as it seems, the issue is sporadic, I'll try to reproduce it with a minimal subset of similar steps as we do, but not sure of success in reproduction.
Is there some debug information, or additional flags/outputs which I can turn on or capture from output in order to help to discover root-cause?

@coeuvre
Copy link
Member

coeuvre commented Oct 12, 2022

Probably related #16423.

coeuvre added a commit to coeuvre/bazel that referenced this issue Nov 22, 2022
Fixes bazelbuild#16422.

Closes bazelbuild#16423.
Closes bazelbuild#16445.

Closes bazelbuild#16464.

PiperOrigin-RevId: 480896881
Change-Id: I33019dbe8a088410280759465100a512a0f61bc1
ShreeM01 pushed a commit that referenced this issue Nov 22, 2022
Fixes #16422.

Closes #16423.
Closes #16445.

Closes #16464.

PiperOrigin-RevId: 480896881
Change-Id: I33019dbe8a088410280759465100a512a0f61bc1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants