Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel remote cache cannot be shared between users? #5632

Closed
zz-pony opened this issue Jul 19, 2018 · 6 comments
Closed

Bazel remote cache cannot be shared between users? #5632

zz-pony opened this issue Jul 19, 2018 · 6 comments
Assignees

Comments

@zz-pony
Copy link

zz-pony commented Jul 19, 2018

Description of the problem / feature request:

I created a very simple cc_binary and published it here: https://github.com/zz-pony/bazel-test. Then I compiled this code on my computer multiple times, and the result looked really weird:

  1. Compile under my account, for the first time,
bazel build -c opt --remote_rest_cache=http://192.168.87.21:30080/ :bazel_test
... ...
INFO: 2 processes: 2 linux-sandbox.
... ...

There was no cache hit, which was as expected.

  1. Clean results and compile under my account, for the second time,
bazel clean
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
bazel build -c opt --remote_rest_cache=http://192.168.87.21:30080/ :bazel_test
... ...
INFO: 2 processes: 2 remote cache hit.
... ...

Cache hit, looking good.

  1. Clone the repo and compile under a different account, for the first time,
bazel build -c opt --remote_rest_cache=http://192.168.87.21:30080/ :bazel_test
... ...
INFO: 2 processes: 2 linux-sandbox.
... ...

Why there was no cache hit this time? Source files should be identical.

I checked nginx log from my cache server. Looks like /ac/${fp} values changed between users. Is this an expected behavior?

192.168.87.21 - - [18/Jul/2018:23:23:17 -0700] "GET /ac/db60bcea5bf7fe6c546e2a686048740f7ce5ad8082d1d4b589ed1a98e19e64ca HTTP/1.1" 404 342 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:17 -0700] "PUT /cas/868e338520344925769e70d89c73813a32e2774e09fb9c0053456d42f92cc458 HTTP/1.1" 201 242 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:17 -0700] "PUT /cas/448cf1a245c9ac1adf5dcf8d9ec83c19eda744b262879208d1f3f06afc0cbc6c HTTP/1.1" 201 242 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:17 -0700] "PUT /ac/db60bcea5bf7fe6c546e2a686048740f7ce5ad8082d1d4b589ed1a98e19e64ca HTTP/1.1" 201 241 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:17 -0700] "GET /ac/1139ba60a836b67bcc7f282462a92a8568df5ecd68cb731701d32c5c3144e4a8 HTTP/1.1" 404 342 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:17 -0700] "PUT /cas/3096e6cbe454634fbb28f733ebed2da103e031c98015aa97bf2ce1ab60b19068 HTTP/1.1" 201 242 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:17 -0700] "PUT /ac/1139ba60a836b67bcc7f282462a92a8568df5ecd68cb731701d32c5c3144e4a8 HTTP/1.1" 201 241 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:30 -0700] "GET /ac/db60bcea5bf7fe6c546e2a686048740f7ce5ad8082d1d4b589ed1a98e19e64ca HTTP/1.1" 200 511 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:30 -0700] "GET /cas/448cf1a245c9ac1adf5dcf8d9ec83c19eda744b262879208d1f3f06afc0cbc6c HTTP/1.1" 200 3391 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:30 -0700] "GET /cas/868e338520344925769e70d89c73813a32e2774e09fb9c0053456d42f92cc458 HTTP/1.1" 200 5921 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:30 -0700] "GET /ac/1139ba60a836b67bcc7f282462a92a8568df5ecd68cb731701d32c5c3144e4a8 HTTP/1.1" 200 369 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:23:30 -0700] "GET /cas/3096e6cbe454634fbb28f733ebed2da103e031c98015aa97bf2ce1ab60b19068 HTTP/1.1" 200 8600 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:29:16 -0700] "GET /ac/f8947379d034e4c51aae8bc16db1105926b80ce9c2be0ec75ce0e807bc178025 HTTP/1.1" 404 342 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:29:16 -0700] "PUT /cas/868e338520344925769e70d89c73813a32e2774e09fb9c0053456d42f92cc458 HTTP/1.1" 204 119 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:29:16 -0700] "PUT /cas/448cf1a245c9ac1adf5dcf8d9ec83c19eda744b262879208d1f3f06afc0cbc6c HTTP/1.1" 204 119 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:29:16 -0700] "PUT /ac/f8947379d034e4c51aae8bc16db1105926b80ce9c2be0ec75ce0e807bc178025 HTTP/1.1" 201 241 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:29:16 -0700] "GET /ac/ce010bf702e1168eadc75c1b7eb9d05a12e4447860dbc760d7c406ecea7268ac HTTP/1.1" 404 342 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:29:16 -0700] "PUT /cas/3096e6cbe454634fbb28f733ebed2da103e031c98015aa97bf2ce1ab60b19068 HTTP/1.1" 204 119 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:29:16 -0700] "PUT /ac/ce010bf702e1168eadc75c1b7eb9d05a12e4447860dbc760d7c406ecea7268ac HTTP/1.1" 201 241 "-" "-" "-"

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Refer to my above description.

What operating system are you running Bazel on?

Ubuntu 16.04

What's the output of bazel info release?

release 0.15.0

Have you found anything relevant by searching the web?

Not yet.

@zz-pony
Copy link
Author

zz-pony commented Jul 19, 2018

This problem did not exist in bazel release 0.13.1. I downgraded my bazel version to 0.13.1, cleaned my remote cache. Then reran the above steps. Outputs were like this:

  1. Compile under my account for the first time,
bazel clean
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
bazel build -c opt --experimental_remote_spawn_cache --remote_rest_cache=http://192.168.87.21:30080/ :bazel_test
... ...
INFO: 3 processes: 2 linux-sandbox, 1 local.
... ...
  1. Clean and compile under my account for the second time,
bazel clean
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
bazel build -c opt --experimental_remote_spawn_cache --remote_rest_cache=http://192.168.87.21:30080/ :bazel_test
... ...
INFO: 3 processes: 2 remote cache hit, 1 local.
... ...
  1. Clean and compile under a different account for the first time,
bazel clean
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
bazel build -c opt --experimental_remote_spawn_cache --remote_rest_cache=http://192.168.87.21:30080/ :bazel_test
... ...
INFO: 3 processes: 2 remote cache hit, 1 local.
... ...

Nginx log:

192.168.87.21 - - [18/Jul/2018:23:56:27 -0700] "GET /ac/10a04fab8684d46f83656e745ccc1fed46a83267c52b62a6b6881b0d53c077ad HTTP/1.1" 404 342 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:27 -0700] "PUT /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1" 201 242 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:27 -0700] "PUT /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1" 204 119 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:28 -0700] "PUT /cas/868e338520344925769e70d89c73813a32e2774e09fb9c0053456d42f92cc458 HTTP/1.1" 201 242 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:28 -0700] "PUT /cas/448cf1a245c9ac1adf5dcf8d9ec83c19eda744b262879208d1f3f06afc0cbc6c HTTP/1.1" 201 242 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:28 -0700] "PUT /ac/10a04fab8684d46f83656e745ccc1fed46a83267c52b62a6b6881b0d53c077ad HTTP/1.1" 201 241 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:28 -0700] "GET /ac/0013861f171f9bb43b599bc5239745fc9533232eed7ebb8a1a29b41cd74f6a3c HTTP/1.1" 404 342 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:28 -0700] "PUT /cas/3096e6cbe454634fbb28f733ebed2da103e031c98015aa97bf2ce1ab60b19068 HTTP/1.1" 201 242 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:28 -0700] "PUT /ac/0013861f171f9bb43b599bc5239745fc9533232eed7ebb8a1a29b41cd74f6a3c HTTP/1.1" 201 241 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:32 -0700] "GET /ac/10a04fab8684d46f83656e745ccc1fed46a83267c52b62a6b6881b0d53c077ad HTTP/1.1" 200 511 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:32 -0700] "GET /cas/448cf1a245c9ac1adf5dcf8d9ec83c19eda744b262879208d1f3f06afc0cbc6c HTTP/1.1" 200 3391 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:32 -0700] "GET /cas/868e338520344925769e70d89c73813a32e2774e09fb9c0053456d42f92cc458 HTTP/1.1" 200 5921 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:32 -0700] "GET /ac/0013861f171f9bb43b599bc5239745fc9533232eed7ebb8a1a29b41cd74f6a3c HTTP/1.1" 200 369 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:32 -0700] "PUT /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1" 204 119 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:32 -0700] "GET /cas/3096e6cbe454634fbb28f733ebed2da103e031c98015aa97bf2ce1ab60b19068 HTTP/1.1" 200 8600 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:56:32 -0700] "PUT /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1" 204 119 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:59:44 -0700] "GET /ac/10a04fab8684d46f83656e745ccc1fed46a83267c52b62a6b6881b0d53c077ad HTTP/1.1" 200 511 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:59:44 -0700] "GET /cas/448cf1a245c9ac1adf5dcf8d9ec83c19eda744b262879208d1f3f06afc0cbc6c HTTP/1.1" 200 3391 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:59:44 -0700] "GET /cas/868e338520344925769e70d89c73813a32e2774e09fb9c0053456d42f92cc458 HTTP/1.1" 200 5921 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:59:44 -0700] "PUT /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1" 204 119 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:59:44 -0700] "PUT /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1" 204 119 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:59:44 -0700] "GET /ac/0013861f171f9bb43b599bc5239745fc9533232eed7ebb8a1a29b41cd74f6a3c HTTP/1.1" 200 369 "-" "-" "-"
192.168.87.21 - - [18/Jul/2018:23:59:44 -0700] "GET /cas/3096e6cbe454634fbb28f733ebed2da103e031c98015aa97bf2ce1ab60b19068 HTTP/1.1" 200 8600 "-" "-" "-"

@buchgr
Copy link
Contributor

buchgr commented Jul 19, 2018

Can you try building with --experimental_strict_action_env?

btw. instead of --experimental_remote_spawn_cache --remote_rest_cache, you can just use --remote_http_cache.

@buchgr buchgr self-assigned this Jul 19, 2018
@zz-pony
Copy link
Author

zz-pony commented Jul 19, 2018

Now it works! Thanks for your quick reply.

I believe --experimental_strict_action_env should be turned on by default for backward compatibility purpose.

@zz-pony
Copy link
Author

zz-pony commented Jul 19, 2018

I found when running bazel build on a more complex project, cache hit rate is still low.

How could I debug bazel cache fingerprints from larger targets are identical across users? Any command to output the relationship between targets and fingerprints? And ideally how each fingerprint is computed?

@buchgr
Copy link
Contributor

buchgr commented Jul 19, 2018

I believe --experimental_strict_action_env should be turned on by default for backward compatibility purpose.

We want to turn it on eventually, but can't right now because of backwards compatibility concerns.

I found when running bazel build on a more complex project, cache hit rate is still low.

when running a build twice from the same environment or when switching between users?

How could I debug bazel cache fingerprints from larger targets are identical across users?

that's currently a bit difficult to debug, as we don't have good tools (yet). Here's the code that builds the action https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnRunner.java#L344

you could add breakpoints to this code or add log statements. Generally what goes into an action protobuf is the command line, hash of input files, environment variables (not with strict action env), names of output files and the platform (currently disabled by default). the finger print is just the hash of the serialized action protobuf.

What kind of build are you running? language rules? setup? package manager? platform?

@buchgr
Copy link
Contributor

buchgr commented Jan 16, 2019

It's been turned on in Bazel 0.22

@buchgr buchgr closed this as completed Jan 16, 2019
snazy added a commit to snazy/cel-java that referenced this issue Apr 8, 2022
The cel-spec build produces some Java artifacts, so the build input takes the Java version.
The Bazel option `--experimental_strict_action_env` _seems_ to fix this, see
bazelbuild/bazel#5632
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants