Persistent worker in C++ #667

nikhilm · 2021-03-31T00:43:23Z

This supersedes #421. I finally had the motivation to just switch to using C++ and fortunately it wasn't as bad as I expected.

Improvements

Written in C++ so it can be built from source on all platforms
Don't have to worry about the bootstrap problem
Much easier to use, including on a per-command basis, just by using a build settings flag.
Incorporates target triple as well as compilation mode in the incremental compilation directory path to match Cargo's directory structure. See the request in Persistent worker without binary dependency #517 (comment).

Currently unsupported:

If this even builds on macOS/Windows, as I don't have access to those platforms.
No CI test that exercises the worker

@dae would you mind trying this out?

going to try this in C++ and have it all work nicely. This commit just sets up a build setting, so that users can customize whether to use the worker by just setting the flag on the command line or in a local bazelrc file. This allows easily switching between worker and non-worker. Use as `bazel build <rust target> --@rules_rust//rust:use-worker`. Obviously this will fail right now.

next up is to actually execute the command.

…ting stderr correctly.

…an input via tools.

A CodedOutputStream preserves its contents in some sense, possibly because of the use of EpsOutputStream and FileOutputStream beneath, which seem to keep buffers around. This was causing the original WorkResponse to be repeatedly written to stdout on future invocations of the worker, leading to Bazel not waiting around for the command to actually run. Specifically, I believe what was happening was, the internal buffer had the initial WorkResponse. Every time Bazel sent a request, it tried to read the response and protobuf would just send the already available response right away.

Consider the layout documented in https://github.com/rust-lang/cargo/blob/58a961314437258065e23cb6316dfc121d96fb71/src/cargo/core/compiler/layout.rs#L50. It relies on the incremental dir being in `target/<triple>/{debug,release}/incremental`. With this change, Bazel is creating a very similar structure.

hlopko · 2021-03-31T08:58:19Z

Wow great work, and I'm sorry for not catching up sooner. I have a couple of questions (these are not review comments, just a high level ramblings :)

If my brief skimming through docs it seems we only need to pass -C incremental=/a/directory to rustc, maybe that can be achieved by https://docs.bazel.build/versions/master/command-line-reference.html#flag--worker_extra_flag?
Does a user need to depend on protobuf when they don't use the worker? I'd think yes, because cc_proto_library needs protobuf, and we currently don't depend on it by default in our repositories.bzl file. So if I'm not mistaken, this is a breaking change.
If we need a wrapper, what do you think about writing it in Rust, with no dependencies, using json instead of proto, and inlining the parser code in it. I think we could make the bootstrapping work by adding a separate rule that doesn't depend on the worker binary and therefore cannot use workers. I had something like that in mind for our process wrapper already and I'm happy to work on it if you want a helping hand.

dae · 2021-03-31T11:12:46Z

nikhilm's original implementation was in Rust in #421 :-) It originally relied on having to compile the worker separately or depend on a third-party binary. I hacked together some changes in #517 to build the Rust worker with the cargo binary provided by rules_rust, and haven't run into any issues with the bootstrapping over the last few months. We have been running into occasional issues with linking errors and rustc crashes however.

@nikhilm did you notice the env changes I posted yesterday on #517 (comment)? They should fix the new worker being spawned for every request mentioned on #517 (comment), which looks like may still happen in the C++ port.

I wonder if the target and compilation mode will be sufficient to fully resolve the ICEs? Some very brief digging through the sources seems to point to the "disambiguator" that rustc is using only depending on the 'metadata' arg. If it is possible to change the output products with different flags without altering the metadata and target flags, maybe that could still lead to issues?

Having only just updated my branch, I'm keen to give that a go for a while to see if any more ICEs crop up, and I'm also rather fond of the regex-based limit on crates to incrementally compile. But if a C++ worker means it becomes viable to include the worker in rules_rust instead of it sitting around in a PR that has to be maintained separately, then that sounds like a net win - and with support for it in tree, I imagine it probably wouldn't be hard to override the C++ worker with a custom Rust one anyway.

hlopko · 2021-04-01T09:15:46Z

Thanks @dae, I really haven't caught up with workers enough.

Bootstrapping with cargo is a problem for our use case, ideally we build everything with Bazel. So if we have to use a worker (see question (1)) I'd still prefer a variant that we can build with a trimmed down rust_binary without extra dependencies.

dae · 2021-04-01T10:00:38Z

Just to clarify, #517 uses the cargo binary that rules_rust downloads when it downloads rustc - it shouldn't require any separate Rust install on the system.

Unfortunately we can't pass incremental=... directly to rustc, as we can't write outside of the sandbox dir which changes each time - we're using the separate worker process not to decrease startup time, but just to be able to write to a persistent location. We could pass that location with worker-extra-flag instead of a custom defined flag, but I'm not sure if that extra flag would be known to the routines in rustc.bzl, and we need some way to know whether the worker code path should be taken or not.

hlopko · 2021-04-01T10:25:43Z

Just to clarify, #517 uses the cargo binary that rules_rust downloads when it downloads rustc - it shouldn't require any separate Rust install on the system.

Yup that's still a problem. We build our own rust toolchain from "scratch". We could build cargo too, but cargo doesn't respect some of the restrictions that our build has. So using cargo is a non-starter.

as we can't write outside of the sandbox dir

I believe you can write to /tmp (and to /dev/shm if it matters), does that help?

We could pass that location with worker-extra-flag instead of a custom defined flag, but I'm not sure if that extra flag would be known to the routines in rustc.bzl, and we need some way to know whether the worker code path should be taken or not.

I see, if there's a logic in rustc.bzl that depends on whether we use worker or not, I guess we need to have a bool_flag such as the experimental-use-worker in this PR. But maybe we still don't need a wrapper?

dae · 2021-04-01T10:59:29Z

Yup that's still a problem. We build our own rust toolchain from "scratch". We could build cargo too, but cargo doesn't respect some of the restrictions that our build has. So using cargo is a non-starter.

Ah, gotcha.

I believe you can write to /tmp

If that's the case, then I think you might be right - we might be able to skip the worker completely. @nikhilm do you foresee any problems with just passing incremental directly to a normal rustc invocation? Perhaps we could use ctx.actions.declare_directory() and pass that in to the incremental arg?

hlopko · 2021-04-01T11:19:21Z

Perhaps we could use ctx.actions.declare_directory() and pass that in to the incremental arg?

Why should we do that? ctx.actions.declare_directory is for an unknown number of action outputs, but what rustc needs is just a directory where it can store it's caches? That directory will not be an output that bazel will track and propagate. If it was so, the directory would be created in the sandbox. But there is no way of telling an action to get the directory from the previous execution, which is what we would need for to make incremental work. So we will use an out of band /tmp directory that bazel will know nothing about.

dae · 2021-04-01T11:48:18Z

Sorry, thinko on my part.

I gave the following proof of concept a try, but found after making a trivial change and compiling again, I reliably run into an internal compiler error. Using --strategy=Rustc=local seems to avoid the ICE, so my guess is the sandboxed paths are creeping into the incremental build products, which using a worker indirectly avoids.

diff --git a/rust/private/rustc.bzl b/rust/private/rustc.bzl
index 1636fdf..01ed62c 100644
--- a/rust/private/rustc.bzl
+++ b/rust/private/rustc.bzl
@@ -573,6 +573,12 @@ def rustc_compile_action(
     else:
         formatted_version = ""

+    incremental_dir = "/tmp/bazel_incremental_{}_{}".format(
+        ctx.var["COMPILATION_MODE"],
+        toolchain.target_triple,
+    )
+    args.add("--codegen", "incremental={}".format(incremental_dir))
+
     ctx.actions.run(
         executable = ctx.executable._process_wrapper,
         inputs = compile_inputs,

thread 'rustc' panicked at 'failed to lookup `SourceFile` in new context', compiler/rustc_middle/src/ty/query/on_disk_cache.rs:710:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.51.0 (2fd73fabe 2021-03-23) running on x86_64-apple-darwin

note: compiler flags: -C opt-level=0 -C debuginfo=0 -C linker=external/local_config_cc/cc_wrapper.sh -C link-args=-lc++ -fobjc-link-runtime -headerpad_max_install_names -no-canonical-prefixes -mmacosx-version-min=11.1 -C incremental --crate-type rlib

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
#0 [generics_of] computing generics of `backend_proto::scheduling_service::Service::run_method`
#1 [crate_variances] computing the variances for items in this crate
end of query stack

nikhilm · 2021-04-01T16:05:34Z

Wow great work, and I'm sorry for not catching up sooner. I have a couple of questions (these are not review comments, just a high level ramblings :)

If my brief skimming through docs it seems we only need to pass -C incremental=/a/directory to rustc, maybe that can be achieved by https://docs.bazel.build/versions/master/command-line-reference.html#flag--worker_extra_flag?

Are you suggesting using this with the existing process_wrapper? We still need the "worker-ness" to escape the sandbox (otherwise the incremental dir is not shared).

Does a user need to depend on protobuf when they don't use the worker? I'd think yes, because cc_proto_library needs protobuf, and we currently don't depend on it by default in our repositories.bzl file. So if I'm not mistaken, this is a breaking change.

Ah I see what you mean. Because the worker is now specified as a dep of the rust rules, will it be built even when not used? In that case, it would be a breaking change.

If we need a wrapper, what do you think about writing it in Rust, with no dependencies, using json instead of proto, and inlining the parser code in it. I think we could make the bootstrapping work by adding a separate rule that doesn't depend on the worker binary and therefore cannot use workers. I had something like that in mind for our process wrapper already and I'm happy to work on it if you want a helping hand.

That was #421. proto/json doesn't matter as there is the bootstrap problem. If there is a reasonable solution to the bootstrap problem that lands in main, then I can rebase 421 on top to use JSON.

nikhilm · 2021-04-01T16:11:40Z

The only reason the current worker implementation is a separate binary instead of being a part of process_wrapper.cc is for ease of implementation. I didn't want to dirty process_wrapper until I had something that works. The resulting changes are small enough that process_wrapper itself could act as both worker/non-worker and would save one extra exec() per invocation. I'm happy to merge that.

I am not so sure about always escaping the sandbox via using /tmp. Isn't the entire goal of Bazel to preserve build isolation unless explicitly opted-out by the user?

However, I agree that in rustc's compilation model, the "worker-ness" (a long lived process) has no benefit. So I guess one could use a user-controlled flag just to decide whether to use incremental builds (breaking build isolation) or not.

That would actually be a trivial change to process_wrapper. How does that sound?

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.

dae · 2021-04-08T13:09:14Z

AFAIK workers are not sandboxed by default, so I don't think a direct approach is actually losing anything over what we already had. And sandboxing seems to break rustc in any case, as can be seen above. I spend a fair bit of time with the worker approach, and you've spent even more, so it's a shame to abandon it, but based on hlopko's observations it does seem unnecessary. I've just pushed a new PR that does away with the worker in favour of changing the rustc args directly, and no changes are needed to process_wrapper: #684

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.

nikhilm added 12 commits March 30, 2021 17:38

Now capable of parsing the WorkRequest

d0cd026

write fake worker responses.

d8e95ec

next up is to actually execute the command.

Rename flag to indicate experimental nature

122be7f

Worker runs command, but without incremental mode and without redirec…

45da97c

…ting stderr correctly.

Sending command stderr to worker response output. works on Linux.

9540dc6

Use compilation mode as part of incremental dir path

3018ff9

When running the worker, the original executable should be marked as …

ed6a0c3

…an input via tools.

Readability fixes and cleanup

a8da854

Add documentation on how to use the C++ worker, which is a lot easier

41f8049

google-cla bot added the cla: yes label Mar 31, 2021

nikhilm added 5 commits March 30, 2021 17:53

refactor to move worker mode into function

7981f84

support for running standalone

83de9fe

clean up indentation

c8165c7

default to false if attr is not present on rule

cfddfb1

buildifier fixes

66623d7

nikhilm force-pushed the cppworker branch from 0f70082 to 3c9c0d9 Compare March 31, 2021 01:41

try adding CI job

8b73681

nikhilm force-pushed the cppworker branch from 3c9c0d9 to 8b73681 Compare March 31, 2021 01:54

hlopko self-requested a review March 31, 2021 08:06

dae mentioned this pull request Apr 8, 2021

incremental compilation without a worker #684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent worker in C++ #667

Persistent worker in C++ #667

nikhilm commented Mar 31, 2021 •

edited

Loading

hlopko commented Mar 31, 2021

dae commented Mar 31, 2021

hlopko commented Apr 1, 2021

dae commented Apr 1, 2021

hlopko commented Apr 1, 2021

dae commented Apr 1, 2021 •

edited

Loading

hlopko commented Apr 1, 2021

dae commented Apr 1, 2021

nikhilm commented Apr 1, 2021

nikhilm commented Apr 1, 2021

dae commented Apr 8, 2021

Persistent worker in C++ #667

Are you sure you want to change the base?

Persistent worker in C++ #667

Conversation

nikhilm commented Mar 31, 2021 • edited Loading

hlopko commented Mar 31, 2021

dae commented Mar 31, 2021

hlopko commented Apr 1, 2021

dae commented Apr 1, 2021

hlopko commented Apr 1, 2021

dae commented Apr 1, 2021 • edited Loading

hlopko commented Apr 1, 2021

dae commented Apr 1, 2021

nikhilm commented Apr 1, 2021

nikhilm commented Apr 1, 2021

dae commented Apr 8, 2021

nikhilm commented Mar 31, 2021 •

edited

Loading

dae commented Apr 1, 2021 •

edited

Loading