Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote-cache not work in 5.2.0 #15682

Closed
Smile-Autra opened this issue Jun 15, 2022 · 12 comments
Closed

Remote-cache not work in 5.2.0 #15682

Smile-Autra opened this issue Jun 15, 2022 · 12 comments
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged

Comments

@Smile-Autra
Copy link

Smile-Autra commented Jun 15, 2022

Description of the bug:

When upgrade to the latest version of bazel. Bazel remote seems not to work. Bazel can only put but can not get. When use bazel 5.1.1 bazel remote works again. More strange, bazel 5.2.0 can use the cache of 5.1.1. It seems that the artifact put by bazel 5.2 has problem.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Use bazel 5.2.0 and use https://github.com/buchgr/bazel-remote as remote cache like bazel build --remote_cache=http://localhost:9090

Which operating system are you running Bazel on?

Ubuntu 20.04

What is the output of bazel info release?

release 5.2.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@sgowroji sgowroji added type: bug untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Jun 15, 2022
@chancila
Copy link
Contributor

@brentleyjones
Copy link
Contributor

@coeuvre

@coeuvre
Copy link
Member

coeuvre commented Jun 22, 2022

What's error message looks like? Does it also the case for HEAD?

@chancila
Copy link
Contributor

chancila commented Jun 22, 2022

I am not sure, it seems the cache is not uploading artifacts in certain scenarios, there are no diagnostic messages. I can't tell if people reporting the issues are only using http caching though, but the person in the slack thread and this report are both using http cache.

@mistic
Copy link

mistic commented Jun 24, 2022

I also confirm this is a problem. We didn't notice at the beginning of the upgrade as we were using the previously populated remote cache but after a couple of days we were having mostly cache misses and less and less cache hits. The problem happens both when using the buildbuddy grpc cache or a simple GCS Bucket. Reverting to 5.1.1 makes everything working again.

@coeuvre
Copy link
Member

coeuvre commented Jun 27, 2022

I am not able to reproduce this locally. Do you know which actions are not uploading artifacts?

@mistic
Copy link

mistic commented Jun 27, 2022

@coeuvre it actually happens in rules from rules_nodejs we are using like TsProject https://github.com/bazelbuild/rules_nodejs/blob/stable/packages/typescript/index.bzl#L109

@nickbreen
Copy link

We reverted back to 5.1.1 due to this, however we do not use rules_nodejs (mainly java_library and genrule).

We noticed because the bazel-remote logs were all GET 404.

@Smile-Autra
Copy link
Author

Same error.

@fmeum
Copy link
Collaborator

fmeum commented Jul 5, 2022

@nickbreen Are you using directory inputs/outputs? rules_nodejs does and 5.2 includes 26f8783, but if you are facing the same issues without relying on TreeArtifacts, then that commit is less likely to be the culprit (which would make it rather likely it's actually 4d900ce).

@nickbreen
Copy link

We have various rules using TreeArtifacts (ctx.actions.declare_directory) and rules that produce a directory output using ctx.outputs and genrules doing the same.

@adam-singer
Copy link
Contributor

adam-singer commented Jul 6, 2022

We have also noticed this similar bug where caching seems to have broken on 5.2.0. When I was tracing the request logs something seemed odd in terms of the action cache GET missing and content addressable PUT. I could be missing a detail here, maybe this will help narrow down to what other folks might be seeing.

Before doing a build there is a miss on the action cache GET 404 127.0.0.1 /ac/877dcafd4dc610e6cc734aa39f91aaf9a8e9a7f730e5bd1021c557c5c74bfda5
https://gist.github.com/adam-singer/0c73500e47d1a2410924dc5be4da0a3a#file-gistfile1-txt-L1

Bazel/rules go along their business and build/calculate results, then I would assume that the next time 877dcafd4dc610e6cc734aa39f91aaf9a8e9a7f730e5bd1021c557c5c74bfda5 is seen in the request logs it would be a PUT to /ac/877dcafd4dc610e6cc734aa39f91aaf9a8e9a7f730e5bd1021c557c5c74bfda5, but what I've seen in our request logs is the remote uploader is making the request to PUT 200 127.0.0.1 /cas/877dcafd4dc610e6cc734aa39f91aaf9a8e9a7f730e5bd1021c557c5c74bfda5 https://gist.github.com/adam-singer/0c73500e47d1a2410924dc5be4da0a3a#file-gistfile1-txt-L62 the content addressable cache.

Could someone confirm or verify seeing similar behavior and if that is expected?

Edit:

4d900ce#diff-5556e28152f91eac5403663774979225b86e64133da164366b56f91d1daa641cR536 The Step is set action.getRemoteActionExecutionContext().setStep(Step.CHECK_ACTION_CACHE); but when comes time to upload via DiskAndRemoteCacheClient.uploadActionResult 4d900ce#diff-f826b1e1a71d4a2d9f9524894df3069dcb54964998277721021093ff7d55d194R62 the check is for Step.UPLOAD_OUTPUTS, that seems to be the reason uploading to action cache isn't happening.

Screen Shot 2022-07-05 at 10 55 07 PM

This patch fixed uploading to action cache https://gist.github.com/adam-singer/6500f268885f1c2c604fd4f9f1fec439, would prefer if owners verify/confirm bug.

sluongng added a commit to sluongng/bazel that referenced this issue Jul 6, 2022
In 4d900ce we introduced validation in
DiskAndRemoteCacheClient.uploadActionResult() where context's step must
be UPLOAD_OUTPUTS to trigger the upload.

However, this value was never set in RemoteExecutionService before hand
thus led to outputs not being uploaded and cause remote cache misses.

Fix bazelbuild#15682
sluongng added a commit to sluongng/bazel that referenced this issue Jul 7, 2022
In 4d900ce we introduced validation in
DiskAndRemoteCacheClient.uploadActionResult() where context's step must
be UPLOAD_OUTPUTS to trigger the upload.

However, this value was never set in RemoteExecutionService before hand
thus led to outputs not being uploaded and cause remote cache misses.

Fix bazelbuild#15682
sluongng added a commit to sluongng/bazel that referenced this issue Jul 7, 2022
In 4d900ce we introduced validation in
DiskAndRemoteCacheClient.uploadActionResult() where context's step must
be UPLOAD_OUTPUTS to trigger the upload.

However, this value was never set in RemoteExecutionService before hand
thus led to outputs not being uploaded and cause remote cache misses.

Fix bazelbuild#15682
sgowroji pushed a commit that referenced this issue Jul 14, 2022
In 4d900ce we introduced validation in
DiskAndRemoteCacheClient.uploadActionResult() where context's step must
be UPLOAD_OUTPUTS to trigger the upload.

However, this value was never set in RemoteExecutionService before hand
thus led to outputs not being uploaded and cause remote cache misses.

Fix #15682

Thanks @adam-singer for doing the investigation 🙏

Closes #15823.

PiperOrigin-RevId: 459519852
Change-Id: Ib004403d7893fe135adcc4b181b607d8cb33f3af

Co-authored-by: Son Luong Ngoc <sluongng@gmail.com>
aranguyen pushed a commit to aranguyen/bazel that referenced this issue Jul 20, 2022
In 4d900ce we introduced validation in
DiskAndRemoteCacheClient.uploadActionResult() where context's step must
be UPLOAD_OUTPUTS to trigger the upload.

However, this value was never set in RemoteExecutionService before hand
thus led to outputs not being uploaded and cause remote cache misses.

Fix bazelbuild#15682

Thanks @adam-singer for doing the investigation 🙏

Closes bazelbuild#15823.

PiperOrigin-RevId: 459519852
Change-Id: Ib004403d7893fe135adcc4b181b607d8cb33f3af
aranguyen pushed a commit to aranguyen/bazel that referenced this issue Jul 20, 2022
In 4d900ce we introduced validation in
DiskAndRemoteCacheClient.uploadActionResult() where context's step must
be UPLOAD_OUTPUTS to trigger the upload.

However, this value was never set in RemoteExecutionService before hand
thus led to outputs not being uploaded and cause remote cache misses.

Fix bazelbuild#15682

Thanks @adam-singer for doing the investigation 🙏

Closes bazelbuild#15823.

PiperOrigin-RevId: 459519852
Change-Id: Ib004403d7893fe135adcc4b181b607d8cb33f3af
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants