-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building iOS resources causes bazel to cache a file for the null sha #6260
Comments
We saw the same issue, and we work around with our remote cache to handle empty sha as well |
If you look in --experimental_execution_log_file output (after converting it using execlog parser), you will see lots of entries like:
This seems to be quite reasonable (tools may want to read from /dev/null under some conditions), so the question is, why would it cause breakage with remote caches for you? Can you please expand on "with remote caches this results in failed builds if you accept this file in your cache" - what exactly you are doing, how it breaks, how you determined that null hash is at fault? |
When I try to
Just based on the difference that if I stop saving keys for that sha from the cache, the builds no longer fail |
Which remote cache server, which version? Is there anything interesting in the logs when the failure happens? (Assigning to team-Execution since this seems to be a general remote caching issue.) |
It's a private implementation. Nothing interesting or different in the logs besides the change between a 404 or 200. |
The empty file is supported as both input and output of remotely cached actions. Most builds I know of would generate an empty file at some point. Remote caching servers are expected to handle an empty file as any other file. From this discussion I don't see the need to special case the empty file on the remote server and I also don't understand the build errors it's causing you. Can you elaborate please? |
Ok so I can reproduce this with the example I provided above like this (with our remote cache running locally):
In this case I get this error from the iOS toolchain:
Here's the entire execution log from the failed build (click to expand)
|
Is there any plan for fixing this issue? |
Thanks for the awesome bug report @keith. @xinzhengzhang I don't have the time to fix it right now. I would be happy to review a patch though! |
@buchgr do you have an understanding of what the issue is? We're seeing this again (I believe) with another similar action. I'd be happy to attempt a patch but I'm not sure where to start |
Spent some more time debugging this with the help of @sergiocampama. We've come to the conclusion that getting a null aha is expected and your remote cache implementation should handle it. We also believe there's a bug in Bazel that in the case the remote cache doesn't return a response for this key, if the action was to create a directory, that doesn't happen. Where it should likely fall back to running locally and successfully create the directory. |
Ok so the second case we found of this is still the same case. The problem seems to be that the rule has a directory defined in its |
The expected behavior is that if there is a problem downloading output artifacts from the remote cache that bazel deletes all outputs that have been downloaded and executes the action locally. Is that not what you are seeing? |
If I'm correct that Bazel is creating this directory which is my current understanding, then yes. Since that would mean to me that it should re-create it. |
So what's the problem then? I fail to understand. |
The documentation for @buchgr can you confirm that tree artifact outputs, as created by Now, as to what pertains to this issue, the problem that @keith is seeing is that when using remote cache, in some cases, bazel will fail to create this output tree artifact during an action execution. So, what we'd like to know is whether the fact that this output directory is missing in this circumstance is a bug with bazel & remote cache, or something that rule authors should be aware of and support in their rules. |
cc @laurentlb |
Bazel creates all declared output directories before executing an action. It's best to not rely on this behavior because it can cause trouble with remote execution. I am saying this fully realizing that many rules do rely on this behavior (which is fair) :). See also #6393. The issue here probably is that Bazel creates the empty output directories, runs the action, gets a remote cache hit and when the download fails deletes the output directories. It then falls back to local execution which doesn't actually create the empty directory because it relies on Bazel to create it. So that then seems to be a bug in the remote cache cleanup logic. We should only delete contents of output directories but not the directories themselves? |
Perhaps? I'm not familiar with how remote cache is expected to work. Who would be the right person to assign this to? |
Me :-). It should be simple enough to fix here: https://source.bazel.build/bazel/+/master:src/main/java/com/google/devtools/build/lib/remote/AbstractRemoteActionCache.java;l=268?q=AbstractRemoteActionCAche @keith may I take you up on your offer to send a PR? :-) |
Thanks for all the context here! I've submitted #6851 which I've verified locally fixes this issue. |
Previously if a remote cache request failed, bazel would cleanup all directories that were marked as outputs for the action. These directories wouldn't end up being re-created by bazel, and could lead to failures if the actions didn't create the directories themselves. With this patch bazel deletes all output files, and the child directories of all output directories. Fixes bazelbuild/bazel#6260 Closes #6851. PiperOrigin-RevId: 224781649
The directories that are to contain spawn outputs have already been created before the spawn executes, so it's unnecessary for RemoteExecutionService to create them (however, tests must be amended to establish that precondition). The comment incorrectly suggests this is a fix for #6260. The actual fix was 4392ba4. In addition, fix a bug whereby materializing an output directory as a symlink would fail because the directory already exists. PiperOrigin-RevId: 541623490 Change-Id: I25abd3b7b5fc068a84f41c86b0050c110443218f
Currently if you build an iOS target that has resources (storyboards or asset catalogs in my testing) bazel will end up caching a file for the null sha
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
. With remote caches this results in failed builds if you accept this file in your cache.You can reproduce this with one of the sample projects in rules_apple:
Here you should see a match. If you remove all the resources portions of this BUILD file, this is no longer the case. After applying this diff:
And running:
You should no longer see any matches. I see a few other references to
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
in other bazel issues, but none specifically talking about whether or not it is a problem. It seems like a few remote cache implementations work around this by not caching files that are empty (which seem to only happen for us when the key is also this sha). I can't find the exact rule that causes this withaquery
, but I'd love to know if this is considered an issue, or if we should work around this in our remote cache.What operating system are you running Bazel on?
macOS
What's the output of
bazel info release
?0.17.1
I migrated this issue from here bazelbuild/rules_apple#228
The text was updated successfully, but these errors were encountered: