-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BES with --bes_upload_mode=fully_async uses cancelled Context when uploading to CAS and errors #11392
Comments
I'm guessing this is actually a bug with the remote worker, since it requires the remote_executor option to trigger it. Passing over to remote-exec team. |
Did you guys figure it out the issue? I am having the same problem when I enable
while bazel gets this
|
I guess the problem is that remote cache client and build events uploader are sharing the same context somehow. See https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java#L498 Trace I see in
Note: trace appears in log file as soon as I invoke any next bazel command ( |
The original repo case is fixed in 4.1.0+. I only get the following warning on 4.0.0 and 3.x:
|
Closing since I think this is fixed by recent improvements. |
This was helpful in debugging bazelbuild/bazel#11392 Closes #11396. PiperOrigin-RevId: 312077991
Description of the problem / feature request:
Using both
--bes_upload_mode=fully_async
and--remote_executor
reliably causes a failed RPC (for all but the first invocation), which gets logged as:Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
What operating system are you running Bazel on?
OSX
What's the output of
bazel info release
?Tried with both
release 3.1.0
andAny other information, logs, or outputs that you want to share?
This appears to be because something is cancelling the gRPC Context used here for a
FindMissingBlobs
for the BES upload.Note that this bug also manifests if you're writing to a remote BES, not just a local JSON file - the local JSON file just seemed like a more minimal case to investigate.
This RPC adheres to the
--remote_retries
flag, so if that value is set greater than a small number, this upload will be retried in a loop until a 5 second timeout is hit and this will be logged:The
--fully_async
flag is presumably intended to speed things up, so blocking the following command for a full 5 seconds while it tries to finish up the previous build is counter-productive.It looks like something needs to change around either how
Context
lifetimes are managed (to not cancel them until background requests are complete), or about whichContext
these background requests happen in, but I'm not sure which approach would be better (or where the code which should manage these lifetimes should live). I'd appreciate pointers (or fixes!) :)The text was updated successfully, but these errors were encountered: