Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows feature generate_pdb_file should be applicable to opt builds #94

Closed
sunjayBhatia opened this issue Dec 7, 2020 · 23 comments
Closed

Comments

@sunjayBhatia
Copy link
Contributor

Description of the problem / feature request:

We would like to generate .pdb files for release builds of Envoy for Windows for use debugging potential production issues. This works well with Bazel building locally as all intermediate/compiler/linker generated files live on the host. We use opt mode, the MSVC cl /Z7 option, and linker options /DEBUG:FULL. However, building Envoy with RBE in CI, the expected .pdb file output is not accessible. It is not an explicit output of the cc_binary rule on Windows and is not downloaded to the host running the build (or presumably any rules that were to depend on the binary target, not sure if a genrule workaround could work to grab the relevant file). This is in contrast to Linux where cc_binary rules create an implicit <target>.dwp target that can be used to access debug info. Using --remote_download_outputs=all (the default) or --features=generate_pdb_file does not work as the feature is only enabled for dbg and fastbuild compilation modes.

It would be great if the restriction of the feature to dbg and fastbuild modes was relaxed

Feature requests: what underlying problem are you trying to solve with this feature?

See above, generate .pdb files as when building Envoy with RBE in opt compilation mode.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Build a cc_binary rule in opt compilation mode and link with MSVC link /DEBUG options to generate a .pdb file in RBE on Windows. See that no .pdb file is downloaded even when using --features=generate_pdb_file or --remote_download_outputs=all.

What operating system are you running Bazel on?

Windows (2019)

What's the output of bazel info release?

release 3.6.0

What version of rules_cc do you use? Can you paste the workspace rule used to fetch rules_cc? What other relevant dependencies does your project have?

https://github.com/envoyproxy/envoy/blob/d382fa60412b1bf3c8b1d82883119c15216d69fc/bazel/repository_locations.bzl#L535-L546

What Bazel options do you use to trigger the issue? What C++ toolchain do you use?

C++ toolchain: https://github.com/envoyproxy/envoy-build-tools/tree/master/toolchains/configs/windows/msvc-cl/bazel_3.6.0 (generated via bazel-toolchains)

Have you found anything relevant by searching the web?

Not as of yet

Any other information, logs, or outputs that you want to share?

Not as of yet

@wrowe
Copy link
Contributor

wrowe commented Dec 8, 2020

While this is describing what should be emitted as a result of linking, there is a parallel PR concerned with emitting the .pdb results corresponding to each .obj file, and then compressed into a library-scope .pdb using fission at lib construction time.

It would be nice if the underlying bazel rules_cc mechanics follow similar toggles. #78

@meteorcloudy
Copy link
Member

meteorcloudy commented Apr 15, 2021

Update: the following about pdb file and linking output is not entirely correct, please see #94 (comment)

The pdb file is generated by the link action but not in the default output group of cc_binary (therefore not retrieved in RBE build). You can access it by specifying the output_group in the following ways:

  1. In command line
    bazel build //:hello-world --output_groups=pdb_file (--output_groups=pdf_file,default if you want both the binary and the pdb file)
  2. In BUILD file
filegroup(
    name = "my_pdb_file",
    srcs = ["//:hello-world"],
    output_group = "pdf_file",
)

Can you try if this fixes the issue with RBE?

Related code: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/rules/cpp/CcBinary.java;l=696?q=%22pdb_file%22&ss=bazel

@wrowe
Copy link
Contributor

wrowe commented Apr 19, 2021

Using the option 2. above, we were successful at envoy in adding an envoy-static-exe target that captures both the default and the pdf_file group. [Option 2 was only an illusion, I'd presumed the absence of an .exe indicated that the .pdb was emitted, which you note later it was not.] Thank you for the guidance! Documenting this for future reference could be helpful?

@meteorcloudy
Copy link
Member

Documenting this for future reference could be helpful?

Yes, we should definitely improving the documentation on this!

@wrowe
Copy link
Contributor

wrowe commented Apr 19, 2021

@meteorcloudy I feel like I'm missing something obvious. In this workaround...

envoy_cc_binary(
    name = "envoy-static",
    stamped = True,
    deps = [":envoy_main_entry_lib"],
)

filegroup(
    name = "envoy-static-pdb",
    srcs = [":envoy-static"],
    output_group = "pdb_file",
)

filegroup(
    name = "envoy-static-exe",
    srcs = [
        ":envoy-static",
        ":envoy-static-pdb",
    ],
)

The envoy-static-pdb does not capture the .exe... but grouping them together in envoy-static-exe does not capture the .pdb.

What I think I'm reaching for is

    output_group = ["default", "pdb_file",],

but there is obviously no such construct.

@wrowe
Copy link
Contributor

wrowe commented Apr 19, 2021

@meteorcloudy although we wanted this structed at the BUILD file level, I have experimented with your top-line solution.

https://dev.azure.com/cncf/envoy/_build/results?buildId=72503&view=logs&jobId=b840a642-5ff3-5357-2e4b-e06e40b0cffd&j=b840a642-5ff3-5357-2e4b-e06e40b0cffd&t=67965174-5100-5631-9dc0-68b9f0aacb53

Alternative 1. listing both file groups for the target produced no .pdb result from our invocation of

bazel [...] build [...] //bazel/... //source/exe:envoy-static --output_groups=pdb_file,default --build_tag_filters=-skip_on_windows

where envoy-static is the cc_binary mentioned above, and which does not generate a .pdb file (I've tried both pdf_file and what i think you meant based on the tooling sources, pdb_file.)

https://github.com/envoyproxy/envoy/pull/16043/files

So the first solution may not work under distributed RBE either?

@meteorcloudy
Copy link
Member

Hmmm, I tested locally with a hello-world cc_binary, it should be working. Can you also confirm if it works for you in a local build?

pcloudy@PCLOUDY2-W MSYS ~/workspace/my_tests/bazel
$ git diff
diff --git a/examples/cpp/BUILD b/examples/cpp/BUILD
index f450f67242..ad7e7cf70c 100644
--- a/examples/cpp/BUILD
+++ b/examples/cpp/BUILD
@@ -14,6 +14,20 @@ cc_binary(
     deps = [":hello-lib"],
 )

+filegroup(
+    name = "hello-world-pdb",
+    srcs = [":hello-world"],
+    output_group = "pdb_file",
+)
+
+filegroup(
+    name = "hello-world-all",
+    srcs = [
+       ":hello-world",
+       ":hello-world-pdb",
+    ],
+)
+
 cc_test(
     name = "hello-success_test",
     srcs = ["hello-world.cc"],

pcloudy@PCLOUDY2-W MSYS ~/workspace/my_tests/bazel
$ bazel build examples/cpp:hello-world-all
...
INFO: Analyzed target //examples/cpp:hello-world-all (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples/cpp:hello-world-all up-to-date:
  bazel-bin/examples/cpp/hello-world.exe
  bazel-bin/examples/cpp/hello-world.pdb
INFO: Elapsed time: 0.273s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

So the first solution may not work under distributed RBE either?

It's possible the RBE code doesn't respect the output groups, I'm not sure if this is a bug.
/cc @coeuvre Do you happen to know what files Bazel choose to retrieve from RBE?

@wrowe
Copy link
Contributor

wrowe commented Apr 21, 2021

Confirming, local build of all varieties we've discussed, as well as our existing logic, all emit the pdb file to bazel-out//x64_windows-opt/bin/source/exe/envoy-static.pdb - but that isn't helping in our RBE CI pipeline where unrecognized nuggets are discarded. What we must avoid is any situation where the .pdb is triggered and invokes the linker, and the .exe is triggered and invokes the linker a second time. MSVC is basically incapable of performing a deterministic build (clang-cl is a little closer to that goal.)

@wrowe
Copy link
Contributor

wrowe commented Apr 27, 2021

This commit sort of makes me want to cry, @meteorcloudy - I don't understand the delta, why the previous case fails in a GCP RBE environment (including the correct spelling of the file group tag), but the second wins. So we have a workaround, not quite one I expected;

envoyproxy/envoy@e1f7b79

Any thoughts?

@meteorcloudy
Copy link
Member

meteorcloudy commented Apr 27, 2021

@wrowe I set up a remote build and did some debugging on this.

It turned out what I said before was not entirely true. The pdb file is always an output of the linking action in dbg and fastbuild modes. No matter what output groups you specific, as long as it triggers the linking action it should always be retrieved from remote as part of the linking action outputs.

The reason it didn't work for you is because you have optimized mode (-c opt) enabled for your remote build, in that case, the pdb file is not generated at all and is of course not an output of the linking action.

I believe if you simply remove the -c opt option the pdb file will be retrieved.

See generated_pdb_file in windows cc toolchain.

@wrowe
Copy link
Contributor

wrowe commented Apr 27, 2021

The pdb file is always an output of the linking action in dbg and fastbuild modes. No matter what output groups you specific, as long as it triggers the linking action it should always be retrieved from remote as part of the linking action outputs.

This has long been our observation, agreed.

The reason it didn't work for you is because you have optimized mode (-c opt) enabled for your remote build, in that case, the pdb file is not generated at all

This is not correct at all. As a default behavior no pdb is generated, but in a release binary, all vendors capture .pdb for windows just as they build linux/bsd release binaries with -g, and then either strip and retain the un-stripped binary, or use objcopy to split symbolic information into a distinct .debug file, pointing the release executable at that parallel .debug symbol file (which is exactly what a .pdb file on windows is).

In this case envoy and most projects add /Z7 symbols to each .obj file although these are still compiled optimized, and we compile release as "/debug /opt:ref /opt:icf" for envoy this logic lives here and here for compilation and linking;
https://github.com/envoyproxy/envoy/blob/ca4ca983195f337a840618b45b9b33cda82c1224/bazel/envoy_internal.bzl#L50
https://github.com/envoyproxy/envoy/blob/ca4ca983195f337a840618b45b9b33cda82c1224/bazel/envoy_binary.bzl#L63

[There is a related ask, which would also need to support .pdb output from /Zi rather than /Z7 for each .obj compilation even in -c opt mode, see https://github.com//pull/78]

and is of course not an output of the linking action.

Yes, this issue describes the fact that a pdb file itself needs to be captured in the linking action even in -c opt build.

I believe if you simply remove the -c opt option the pdb file will be retrieved.

Yes, that might be the case, but that is not our bazel use case. Our use case is for bazel to generate release binaries optimized for the end user, capturing the un-stripped symbolic information for the released binary for forensic crash dump analysis. No responsible vendor ships an executable without this; in proprietary source this is either withheld for internal use only, perhaps further stripped for public consumption with only public symbols, while in open source there is no reason to withhold or conceal this data. It's not part of the "binary distribution" but typically a distinct "-dbg", "-debug", "-debuginfo" corresponding package.

Back to our ask, we are capturing the un-stripped linux executable we distribute from envoy opt builds. The same functionality is required on windows. Any help you can offer would be greatly appreciated.

There is the totally distinct question of whether this should be the default for all -c opt output. A stripped linux binary is equivalent to "just the .exe" on Windows. And we've elected not to emit symbolic info for the hundreds and hundreds of test .exe files, since they are not production code shipping to the public, their inputs are predictable, and they can be rebuilt with symbolic info on an individual basis for deeper troubleshooting of a test case failure.

@meteorcloudy
Copy link
Member

@wrowe Thanks for the explanation! I think now I understand your use case.

IIUC, you want to build with opt mode but still generate the pdb file by passing flags like /DEBUG:FULL by yourself. But the problem is that Bazel only recognizes the pdb file as an output of the linking action when the feature generate_pdb_file is enabled, see here.

However, currently generate_pdb_file is only triggered in debug and fastbuild modes, and it is also restricted to be used in those two modes, check the feature definition:

        generate_pdb_file_feature = feature(
            name = "generate_pdb_file",
            requires = [
                feature_set(features = ["dbg"]),
                feature_set(features = ["fastbuild"]),
            ],
        )

Fortunately, I see you use your own generated msvc toolchain in the remote build:
https://github.com/envoyproxy/envoy-build-tools/blob/main/toolchains/configs/windows/msvc-cl

Then how about this solution:

  1. Remove those lines that restricts the generated_pdb_file feature to dbg or fastbuild
  2. Enable the generated_pdb_file feature. You can do it by:
    • Either in command line with --features=generated_pdb_file, this will enable the feature for all cc_binaries.
    • Or add features = ["generate_pdb_file"] to the cc_binary you want the pdb file generated, maybe this fits your use case better?

I manually tested this solution should work as long as you ensure /DEBUG flag is passed, otherwise Bazel will complain the pdb file is not generated.

@wrowe
Copy link
Contributor

wrowe commented May 5, 2021

Let's make it so, drop the restrictions, as quickly as 4.1.0-rc5 if at all possible, and leave all defaults. Note I find the same logic to be corrected in the rules_cc repository, cc/private/toolchain/windows_cc_toolchain_config.bzl and in the bazel repository itself in tools/cpp/windows_cc_toolchain_config.bzl

This makes it possible for us to avoid deploying a forked bazel build to our GCP RBE environment, and proceed with testing various combinations of opt with pdb and fastbuild omitting pdb.

Thanks for your continued research @meteorcloudy

@wrowe
Copy link
Contributor

wrowe commented May 5, 2021

I have limited flexibility to test these changes in our GCP RBE environment, it would be helpful if you could do so. Once in bazel, testing the various combinations in our toolchain should be straightforward here.

@meteorcloudy
Copy link
Member

I have verified the solution in a GCP RBE environment. I'm sending a change to both Bazel and rules_cc.

I don't think the fix will get into a Bazel release very soon since it's likely we won't have a rc5 for 4.1.0.

Shouldn't you have full control over https://github.com/envoyproxy/envoy-build-tools? You'll need to update this even when you have the fix in Bazel, right?

@yanavlasov
Copy link

yanavlasov commented May 5, 2021

@meteorcloudy are you suggesting we put in custom bazel build into Envoy's build docker image? This will complicate things quite a bit.
When is the next bazel release?

@wrowe
Copy link
Contributor

wrowe commented May 5, 2021

Correct, we regenerate envoy-build-tools on every bazel or dependency bump that we want to capture.

However, we drive bazel from bazelisk, so it has to be something, even by-tag, I can have bazelisk grab. I didn't see any interesting flags to have bazelisk point to a a flavor of bazel from a non-bazelbuild repo.

@meteorcloudy
Copy link
Member

@meteorcloudy are you suggesting we put in custom bazel build into Envoy's build docker image? This will complicate things quite a bit.

Nope, I thought you are using the toolchains checked in under https://github.com/envoyproxy/envoy-build-tools/tree/main/toolchains/configs/windows/msvc-cl/bazel_4.0.0/cc, so you can just fix the toolchain here. But looks like you're actually using the generated toolchain directly, then my suggestion probably won't work.

When is the next bazel release?

4.1.0 will be released soon, after that we'll start preparing for 4.2.0, we don't have a fixed timeline yet.
See bazelbuild/bazel#13099 (comment)

bazel-io pushed a commit that referenced this issue May 6, 2021
…stbuild mode

Users may want to build in opt mode and still want the pdb file to be a linking action output.
See #94 (comment)

RELNOTES: None
PiperOrigin-RevId: 372351216
Change-Id: I1b8a8709014deb4abdb13db387e185cd08f408d7
bazel-io pushed a commit to bazelbuild/bazel that referenced this issue May 6, 2021
…stbuild mode

Users may want to build in opt mode and still want the pdb file to be a linking action output.
See bazelbuild/rules_cc#94 (comment)

RELNOTES: None
PiperOrigin-RevId: 372351216
@wrowe
Copy link
Contributor

wrowe commented May 7, 2021

So, the converse of this problem, removing pdb objects from fastbuild, appears to be equally problematic...

The rules_cc msvc/clang-cl options are added after all linkflags, an antipattern to the linux implementation. This means that

/MACHINE:X64
/DEFAULTLIB:libcmt.lib
/DEBUG:FASTLINK
/INCREMENTAL:NO

will always override any linkopts specified in the cc_binary (or cc_library) and cannot be overridden, for example, with linkopts = ["/DEBUG:NONE"]

Since nocopts has been removed from bazel, it is also impossible for fastbuild to remove the msvc/clang-cl hack for the /Z7 flag. Unlike linux -g0, there is no corresponding /Z0 override. Perhaps a custom feature recognizing an imaginary /Z0 (-Z0) flag that would cut all previously given /Z7 and /Zi flags might be a solution?

We should be able to toggle the feature off to avoid capturing .pdb output of the fastbuild binary link step, but this isn't sufficient to avoid the wasted build disk and cpu utilization when compiling fastbuild.

Thoughts?

@meteorcloudy
Copy link
Member

Hmm,, ideally users should be able to override certain features with their own desired flags, but this isn't currently possible in our auto detected cc toolchain setup. /cc @oquenchil This is another case that our cc toolchain isn't still flexible enough.

The only solution I can think of is to use your own customized cc toolchain, which can be derived from the auto detected one. An example is here: https://github.com/meteorcloudy/my_tests/tree/master/preconfigure_msvc_toolchain
The downside is that the compiler path will be fixed absolute path in this preconfigure toolchain, so it only works for environment where compiler path is determined and matches, like RBE.

@wrowe
Copy link
Contributor

wrowe commented May 10, 2021

So I can report that bumping only bazelbuild/rules_cc in envoyproxy/envoy#16043 - including the capture of the bazel --output_groups=pdb_file,default - emitted no .pdb file to be copied. I'm going to presume the other half of this solution is to fix bazel itself, which I can't practically test until later in the week.

katre pushed a commit to bazelbuild/bazel that referenced this issue Jul 12, 2021
…stbuild mode

Users may want to build in opt mode and still want the pdb file to be a linking action output.
See bazelbuild/rules_cc#94 (comment)

RELNOTES: None
PiperOrigin-RevId: 372351216
katre pushed a commit to katre/bazel that referenced this issue Jul 13, 2021
…stbuild mode

Users may want to build in opt mode and still want the pdb file to be a linking action output.
See bazelbuild/rules_cc#94 (comment)

RELNOTES: None
PiperOrigin-RevId: 372351216
katre pushed a commit to katre/bazel that referenced this issue Jul 13, 2021
…stbuild mode

Users may want to build in opt mode and still want the pdb file to be a linking action output.
See bazelbuild/rules_cc#94 (comment)

RELNOTES: None
PiperOrigin-RevId: 372351216
wrowe added a commit to wrowe/envoy that referenced this issue Oct 4, 2021
…tput

See bazelbuild/rules_cc#94 for details

Use generate_pdb_file feature to capture pdb from envoy-static.exe

- Introduces features list arg to envoy_cc_binary
- Toggles the feature only for the windows opt case in the envoy-static target
- Capture the resulting envoy-static.pdb output

Signed-off-by: William A Rowe Jr <wrowe@vmware.com>
@wrowe
Copy link
Contributor

wrowe commented Apr 26, 2022

As observed at envoyproxy/envoy#16043 this may be largely resolved although the solution could be simpler.

@meteorcloudy
Copy link
Member

Thanks for the notice, seems this issues can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants