-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent worker in C++ #667
base: main
Are you sure you want to change the base?
Conversation
going to try this in C++ and have it all work nicely. This commit just sets up a build setting, so that users can customize whether to use the worker by just setting the flag on the command line or in a local bazelrc file. This allows easily switching between worker and non-worker. Use as `bazel build <rust target> --@rules_rust//rust:use-worker`. Obviously this will fail right now.
next up is to actually execute the command.
…ting stderr correctly.
…an input via tools.
A CodedOutputStream preserves its contents in some sense, possibly because of the use of EpsOutputStream and FileOutputStream beneath, which seem to keep buffers around. This was causing the original WorkResponse to be repeatedly written to stdout on future invocations of the worker, leading to Bazel not waiting around for the command to actually run. Specifically, I believe what was happening was, the internal buffer had the initial WorkResponse. Every time Bazel sent a request, it tried to read the response and protobuf would just send the already available response right away.
Consider the layout documented in https://github.com/rust-lang/cargo/blob/58a961314437258065e23cb6316dfc121d96fb71/src/cargo/core/compiler/layout.rs#L50. It relies on the incremental dir being in `target/<triple>/{debug,release}/incremental`. With this change, Bazel is creating a very similar structure.
Wow great work, and I'm sorry for not catching up sooner. I have a couple of questions (these are not review comments, just a high level ramblings :)
|
nikhilm's original implementation was in Rust in #421 :-) It originally relied on having to compile the worker separately or depend on a third-party binary. I hacked together some changes in #517 to build the Rust worker with the cargo binary provided by rules_rust, and haven't run into any issues with the bootstrapping over the last few months. We have been running into occasional issues with linking errors and rustc crashes however. @nikhilm did you notice the env changes I posted yesterday on #517 (comment)? They should fix the new worker being spawned for every request mentioned on #517 (comment), which looks like may still happen in the C++ port. I wonder if the target and compilation mode will be sufficient to fully resolve the ICEs? Some very brief digging through the sources seems to point to the "disambiguator" that rustc is using only depending on the 'metadata' arg. If it is possible to change the output products with different flags without altering the metadata and target flags, maybe that could still lead to issues? Having only just updated my branch, I'm keen to give that a go for a while to see if any more ICEs crop up, and I'm also rather fond of the regex-based limit on crates to incrementally compile. But if a C++ worker means it becomes viable to include the worker in rules_rust instead of it sitting around in a PR that has to be maintained separately, then that sounds like a net win - and with support for it in tree, I imagine it probably wouldn't be hard to override the C++ worker with a custom Rust one anyway. |
Thanks @dae, I really haven't caught up with workers enough. Bootstrapping with cargo is a problem for our use case, ideally we build everything with Bazel. So if we have to use a worker (see question (1)) I'd still prefer a variant that we can build with a trimmed down rust_binary without extra dependencies. |
Just to clarify, #517 uses the cargo binary that rules_rust downloads when it downloads rustc - it shouldn't require any separate Rust install on the system. Unfortunately we can't pass incremental=... directly to rustc, as we can't write outside of the sandbox dir which changes each time - we're using the separate worker process not to decrease startup time, but just to be able to write to a persistent location. We could pass that location with worker-extra-flag instead of a custom defined flag, but I'm not sure if that extra flag would be known to the routines in rustc.bzl, and we need some way to know whether the worker code path should be taken or not. |
Yup that's still a problem. We build our own rust toolchain from "scratch". We could build cargo too, but cargo doesn't respect some of the restrictions that our build has. So using cargo is a non-starter.
I believe you can write to /tmp (and to /dev/shm if it matters), does that help?
I see, if there's a logic in rustc.bzl that depends on whether we use worker or not, I guess we need to have a |
Ah, gotcha.
If that's the case, then I think you might be right - we might be able to skip the worker completely. @nikhilm do you foresee any problems with just passing incremental directly to a normal rustc invocation? Perhaps we could use ctx.actions.declare_directory() and pass that in to the incremental arg? |
Why should we do that? |
Sorry, thinko on my part. I gave the following proof of concept a try, but found after making a trivial change and compiling again, I reliably run into an internal compiler error. Using --strategy=Rustc=local seems to avoid the ICE, so my guess is the sandboxed paths are creeping into the incremental build products, which using a worker indirectly avoids. diff --git a/rust/private/rustc.bzl b/rust/private/rustc.bzl
index 1636fdf..01ed62c 100644
--- a/rust/private/rustc.bzl
+++ b/rust/private/rustc.bzl
@@ -573,6 +573,12 @@ def rustc_compile_action(
else:
formatted_version = ""
+ incremental_dir = "/tmp/bazel_incremental_{}_{}".format(
+ ctx.var["COMPILATION_MODE"],
+ toolchain.target_triple,
+ )
+ args.add("--codegen", "incremental={}".format(incremental_dir))
+
ctx.actions.run(
executable = ctx.executable._process_wrapper,
inputs = compile_inputs,
|
Are you suggesting using this with the existing process_wrapper? We still need the "worker-ness" to escape the sandbox (otherwise the incremental dir is not shared).
Ah I see what you mean. Because the worker is now specified as a dep of the rust rules, will it be built even when not used? In that case, it would be a breaking change.
That was #421. proto/json doesn't matter as there is the bootstrap problem. If there is a reasonable solution to the bootstrap problem that lands in main, then I can rebase 421 on top to use JSON. |
The only reason the current worker implementation is a separate binary instead of being a part of process_wrapper.cc is for ease of implementation. I didn't want to dirty process_wrapper until I had something that works. The resulting changes are small enough that process_wrapper itself could act as both worker/non-worker and would save one extra I am not so sure about always escaping the sandbox via using /tmp. Isn't the entire goal of Bazel to preserve build isolation unless explicitly opted-out by the user? However, I agree that in rustc's compilation model, the "worker-ness" (a long lived process) has no benefit. So I guess one could use a user-controlled flag just to decide whether to use incremental builds (breaking build isolation) or not. That would actually be a trivial change to process_wrapper. How does that sound? |
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.
AFAIK workers are not sandboxed by default, so I don't think a direct approach is actually losing anything over what we already had. And sandboxing seems to break rustc in any case, as can be seen above. I spend a fair bit of time with the worker approach, and you've spent even more, so it's a shame to abandon it, but based on hlopko's observations it does seem unnecessary. I've just pushed a new PR that does away with the worker in favour of changing the rustc args directly, and no changes are needed to process_wrapper: #684 |
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This supersedes #421. I finally had the motivation to just switch to using C++ and fortunately it wasn't as bad as I expected.
Improvements
Currently unsupported:
@dae would you mind trying this out?