Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote: Fix crashes by InterruptedException when dynamic execution is enabled. #15001

Closed
wants to merge 9 commits into from

Conversation

coeuvre
Copy link
Member

@coeuvre coeuvre commented Mar 8, 2022

Fixes #14433.

The root cause is, inside RemoteExecutionCache, the result of FindMissingDigests is shared with other threads without considering error handling. For example, if there are two or more threads uploading the same input and one thread got interrupted when waiting for the result of FindMissingDigests call, the call is cancelled and others threads still waiting for the upload will receive upload error due to the cancellation which is wrong.

This PR fixes this by effectively applying reference count to the result of FindMissingDigests call so that if one thread got interrupted, as long as there are other threads depending on the result, the call won't be cancelled and the upload can continue.

@coeuvre coeuvre requested a review from a team as a code owner March 8, 2022 17:34
@brentleyjones
Copy link
Contributor

@coeuvre There are conflicts when I try to cherry-pick 702df84 for #14709. Could you please submit a cherry-pick PR? 🙏

coeuvre added a commit to coeuvre/bazel that referenced this pull request Mar 21, 2022
… enabled.

Fixes bazelbuild#14433.

The root cause is, inside `RemoteExecutionCache`, the result of `FindMissingDigests` is shared with other threads without considering error handling. For example, if there are two or more threads uploading the same input and one thread got interrupted when waiting for the result of `FindMissingDigests` call, the call is cancelled and others threads still waiting for the upload will receive upload error due to the cancellation which is wrong.

This PR fixes this by effectively applying reference count to the result of `FindMissingDigests` call so that if one thread got interrupted, as long as there are other threads depending on the result, the call won't be cancelled and the upload can continue.

Closes bazelbuild#15001.

PiperOrigin-RevId: 436180205
@coeuvre coeuvre deleted the fix-14433 branch March 21, 2022 16:04
coeuvre added a commit to coeuvre/bazel that referenced this pull request Mar 21, 2022
… enabled.

Fixes bazelbuild#14433.

The root cause is, inside `RemoteExecutionCache`, the result of `FindMissingDigests` is shared with other threads without considering error handling. For example, if there are two or more threads uploading the same input and one thread got interrupted when waiting for the result of `FindMissingDigests` call, the call is cancelled and others threads still waiting for the upload will receive upload error due to the cancellation which is wrong.

This PR fixes this by effectively applying reference count to the result of `FindMissingDigests` call so that if one thread got interrupted, as long as there are other threads depending on the result, the call won't be cancelled and the upload can continue.

Closes bazelbuild#15001.

PiperOrigin-RevId: 436180205
Wyverald pushed a commit that referenced this pull request Mar 21, 2022
… enabled. (#15091)

Fixes #14433.

The root cause is, inside `RemoteExecutionCache`, the result of `FindMissingDigests` is shared with other threads without considering error handling. For example, if there are two or more threads uploading the same input and one thread got interrupted when waiting for the result of `FindMissingDigests` call, the call is cancelled and others threads still waiting for the upload will receive upload error due to the cancellation which is wrong.

This PR fixes this by effectively applying reference count to the result of `FindMissingDigests` call so that if one thread got interrupted, as long as there are other threads depending on the result, the call won't be cancelled and the upload can continue.

Closes #15001.

PiperOrigin-RevId: 436180205
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bazel 5.0.0rc3 crashes with local and remote actions being done
2 participants