Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--incompatible_remote_build_event_upload_respect_no_cache still uploads some no-cache outputs #16151

Closed
brentleyjones opened this issue Aug 23, 2022 · 17 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@brentleyjones
Copy link
Contributor

brentleyjones commented Aug 23, 2022

Description of the bug:

Some test target outputs are still uploaded to the cache even when --incompatible_remote_build_event_upload_respect_no_cache and --modify_execution_info='.*=+no-remote' are set. rules_apple tests upload the binary and the zip, and swift_test has the runner uploaded (which it shouldn't), but not the binary...

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

The following will repro the bug:

git clone https://github.com/buildbuddy-io/rules_xcodeproj.git

cd rules_xcodeproj

# Modify `examples/command_line/Tests/SwiftGreetingsTests.swift` adding something like
# `public let cache = "123"` and changing as needed to produce a new end output

bazel build --config=cache --modify_execution_info='.*=+no-remote' //examples/command_line/Tests:LibSwiftTests

Resulting in bytestream/upload for the test outputs:

        "importantOutput": [
            {
                "name": "examples/command_line/Tests/LibSwiftTests",
                "uri": "bytestream://remote.buildbuddy.io/blobs/593de6d72a4f278017c499cffe5cee734144740cffdc7a4cbea2b69eb2d712de/7289",
                "pathPrefix": [
                    "bazel-out",
                    "darwin_arm64-fastbuild",
                    "bin"
                ]
            },
            {
                "name": "examples/command_line/Tests/LibSwiftTests.zip",
                "uri": "bytestream://remote.buildbuddy.io/blobs/a1f751591b0fb98ddd985dbf5bd957e59dd76b5c5ce61ccb7bd6030a400da02d/119243",
                "pathPrefix": [
                    "bazel-out",
                    "applebin_macos-darwin_arm64-fastbuild-ST-3a19c795fefb",
                    "bin"
                ]
            }
        ]

The following will somewhat repro the bug:

# Modify `tools/generator/test/AddTargetsTests.swift` adding something like
# `public let cache = "123"` and changing as needed to produce a new end output

bazel build --config=cache --modify_execution_info='.*=+no-remote' //tools/generator/test:tests

Resulting in bytestream/upload for the test runner, but not the binary:

        "importantOutput": [
            {
                "name": "tools/generator/test/tests.test-runner.sh",
                "uri": "bytestream://remote.buildbuddy.io/blobs/c4af21acfc33711fe544144629937d18dbd29aff5add8f946907fcf04c72ee21/2278",
                "pathPrefix": [
                    "bazel-out",
                    "darwin_arm64-fastbuild",
                    "bin"
                ]
            },
            {
                "name": "tools/generator/test/tests.xctest/Contents/MacOS/tests",
                "uri": "file:///Users/brentley/Developer/rules_xcodeproj/bazel-output-base/execroot/com_github_buildbuddy_io_rules_xcodeproj/bazel-out/darwin_arm64-fastbuild/bin/tools/generator/test/tests.xctest/Contents/MacOS/tests",
                "pathPrefix": [
                    "bazel-out",
                    "darwin_arm64-fastbuild",
                    "bin"
                ]
            }
        ]

Which operating system are you running Bazel on?

macOS 12.4

What is the output of bazel info release?

release 5.3.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@brentleyjones
Copy link
Contributor Author

cc: @coeuvre

@sgowroji sgowroji added type: bug untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team more data needed labels Aug 23, 2022
@sgowroji
Copy link
Member

Hello @brentleyjones, While performing above shared steps gives access error. Could you please guide if anything is required from our end to test local.

(base) sgowroji-macbookpro:bazelworkspace sgowroji$ git clone git@github.com:buildbuddy-io/rules_xcodeproj.git
Cloning into 'rules_xcodeproj'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

@brentleyjones
Copy link
Contributor Author

It's a git clone. I've changed the command to use https for easier use for people that don't have ssh keys setup on GitHub.

@coeuvre coeuvre self-assigned this Aug 31, 2022
@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) and removed untriaged labels Aug 31, 2022
@coeuvre
Copy link
Member

coeuvre commented Aug 31, 2022

Thanks for the repro!

The reason why there are still outputs being uploaded even you set --modify_execution_info='.*=+no-remote' is modify_execution_info only applies to actions that support execution_info.

In your repro, there are some actions created by the rule macos_unit_test e.g. SymlinkAction and TemplateExpansionAction don't support execution_info hence their outputs are still uploaded. Hence, I wouldn't consider this is a bug for flag --incompatible_remote_build_event_upload_respect_no_cache.

However, I do understand that users do want to run all actions of a target locally sometimes, and prevent outputs of that target being uploaded by BEP. --modify_execution_info which works on action granularity might not the best option for this purpose. I don't have other options in my mind right now but we might need something works on rule granularity.

@coeuvre coeuvre added type: feature request P2 We'll consider working on this in future. (Assignee optional) and removed type: bug P1 I'll work on this now. (Assignee required) labels Aug 31, 2022
@brentleyjones
Copy link
Contributor Author

brentleyjones commented Aug 31, 2022

Yes. This is preventing people from using --incompatible_remote_build_event_upload_respect_no_cache. Those test binaries can be large, and there can be hundreds of them, and they don't want them uploaded. They instead turn to disabling all BES related uploads, which prevents the timing profile from being uploaded, which they really want uploaded.

@keith
Copy link
Member

keith commented Sep 12, 2022

@coeuvre any other advice on solutions for this case? This is a hard blocker for us to fully use BEP services in generally since we have to force some tests locally, and don't want to spend many GB uploads for throw away data, but then that's tied to us uploading useful things like the full test logs etc

@coeuvre
Copy link
Member

coeuvre commented Sep 15, 2022

I understand the problem is to prevent (some) outputs of (some) local actions to be uploaded to BES. What I am missing are:

  • Except for some test targets, are there other cases you want to avoid upload local outputs?
  • Is the decision of whether upload output to BES or not made only based on the size? Are there other criteria?

I think the solution might be different depending on the answers to above questions.

@brentleyjones
Copy link
Contributor Author

I think for all of these, they shouldn't upload if it's only BES that is trying to upload them. If they would upload normally, then they will be uploaded normally. Basically, BES shouldn't cause these blobs, if they end up in "important outputs", to be uploaded to BES. If they were previously uploaded, then they can have the bytestream URL.

@coeuvre
Copy link
Member

coeuvre commented Sep 15, 2022

Do you mean something similar to --nobuild_event_json_file_path_conversion but for the BES?

@brentleyjones
Copy link
Contributor Author

Sorta. That already exists as --experimental_build_event_upload_strategy=local right? My request is a little different. Paths can/should be converted for all blobs that are uploaded, but I don't want something uploaded simply because it's converting the path (which is what I believe is happening here in the BES uploader). Also, I want to keep the uploading of the timing profile and test logs.

@coeuvre
Copy link
Member

coeuvre commented Sep 15, 2022

Okay, then it's essentially an uploader that:

  • only converted paths for blobs that are already uploaded.
  • also upload and convert paths for blobs that are in an allowlist.

Do you think using regex to match the path of outputs is a good enough allowlist mechanism?

@keith Does this solution work for you?

@brentleyjones
Copy link
Contributor Author

Do you think using regex to match the path of outputs is a good enough allowlist mechanism?

If we can reliably describe the paths of tests logs and the timing profile, I believe so. Though most users of BES would want this to be the default behavior, and it would be a little annoying to have to tell every user to set this flag.

@coeuvre
Copy link
Member

coeuvre commented Sep 16, 2022

Thanks for the clarifications! Now it's clear to me.

Here is solution: we introduce a new flag --experimental_remote_build_event_upload_outputs whose values can be all | minimal and defaults to all. When set to minimal, the uploader only converts path for blobs that are already uploaded to remote cache. It can upload blobs and convert their paths if it thinks they are important to consumers of BEP (i.e. test logs and timing profile but we can define more later). For other blobs, it just use the file URI.

WDTY?

@brentleyjones
Copy link
Contributor Author

Yeah, I think that is nice. Ideally, when it leaves experimental state, the default flips to minimal. I don't think all is expected, given my experience with lots of BES customers.

@coeuvre
Copy link
Member

coeuvre commented Sep 16, 2022

SGTM. we can try to make this into 6.0 and if everything works well we can remove experimental_ prefix and flip to minimal in the next major release.

aiuto pushed a commit to aiuto/bazel that referenced this issue Oct 12, 2022
Add flag `--experimental_remote_build_event_upload` which controls the way Bazel uploads files referenced in BEP to remote cache.

It defaults to `all` which maintains current behaviour: Bazel uploads all local files referenced by BEP to remote cache and convert their paths to `bytestream://...`. Additionally, `--incompatible_remote_build_event_upload_respect_no_cache` can be set to avoid uploading outputs that are generated by "non-remote-cachable" spawns.

If set to `minimal`, local outputs are not uploaded to the remote cache, except for files that are **important** to the consumers of BES (e.g. test logs and timing profile). Paths for files that are already uploaded to the remote cache are converted.

`--incompatible_remote_build_event_upload_respect_no_cache` is deprecated in favour of this new flag.

Fixes bazelbuild#16151.

Closes bazelbuild#16299.

PiperOrigin-RevId: 476865036
Change-Id: I4c506f7447a41e8e64a4ed0785e7f20a40ea3b84
@exoson
Copy link
Contributor

exoson commented Dec 8, 2022

Seems that when an action fails, the stderr will be found under a path something like bazel-out/_tmp/actions/stderr-5. I think it would make sense to upload those too.

@exoson
Copy link
Contributor

exoson commented Jan 2, 2023

Made a PR for uploading stdout and stderr for actions as well #17110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants