Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent worker in C++ #667

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Persistent worker in C++ #667

wants to merge 18 commits into from

Conversation

nikhilm
Copy link

@nikhilm nikhilm commented Mar 31, 2021

This supersedes #421. I finally had the motivation to just switch to using C++ and fortunately it wasn't as bad as I expected.

Improvements

Currently unsupported:

  • If this even builds on macOS/Windows, as I don't have access to those platforms.
  • No CI test that exercises the worker

@dae would you mind trying this out?

going to try this in C++ and have it all work nicely.
This commit just sets up a build setting, so that users can customize
whether to use the worker by just setting the flag on the command line
or in a local bazelrc file. This allows easily switching between
worker and non-worker.

Use as `bazel build <rust target> --@rules_rust//rust:use-worker`.
Obviously this will fail right now.
next up is to actually execute the command.
A CodedOutputStream preserves its contents in some sense, possibly because of the use of EpsOutputStream and FileOutputStream beneath, which seem to keep buffers around. This was causing the original WorkResponse to be repeatedly written to stdout on future invocations of the worker, leading to Bazel not waiting around for the command to actually run.
Specifically, I believe what was happening was, the internal buffer had the initial WorkResponse. Every time Bazel sent a request, it tried to read the response and protobuf would just send the already available response right away.
Consider the layout documented in
https://github.com/rust-lang/cargo/blob/58a961314437258065e23cb6316dfc121d96fb71/src/cargo/core/compiler/layout.rs#L50.
It relies on the incremental dir being in
`target/<triple>/{debug,release}/incremental`. With this change, Bazel
is creating a very similar structure.
@google-cla google-cla bot added the cla: yes label Mar 31, 2021
@hlopko
Copy link
Member

hlopko commented Mar 31, 2021

Wow great work, and I'm sorry for not catching up sooner. I have a couple of questions (these are not review comments, just a high level ramblings :)

  1. If my brief skimming through docs it seems we only need to pass -C incremental=/a/directory to rustc, maybe that can be achieved by https://docs.bazel.build/versions/master/command-line-reference.html#flag--worker_extra_flag?

  2. Does a user need to depend on protobuf when they don't use the worker? I'd think yes, because cc_proto_library needs protobuf, and we currently don't depend on it by default in our repositories.bzl file. So if I'm not mistaken, this is a breaking change.

  3. If we need a wrapper, what do you think about writing it in Rust, with no dependencies, using json instead of proto, and inlining the parser code in it. I think we could make the bootstrapping work by adding a separate rule that doesn't depend on the worker binary and therefore cannot use workers. I had something like that in mind for our process wrapper already and I'm happy to work on it if you want a helping hand.

@dae
Copy link
Contributor

dae commented Mar 31, 2021

nikhilm's original implementation was in Rust in #421 :-) It originally relied on having to compile the worker separately or depend on a third-party binary. I hacked together some changes in #517 to build the Rust worker with the cargo binary provided by rules_rust, and haven't run into any issues with the bootstrapping over the last few months. We have been running into occasional issues with linking errors and rustc crashes however.

@nikhilm did you notice the env changes I posted yesterday on #517 (comment)? They should fix the new worker being spawned for every request mentioned on #517 (comment), which looks like may still happen in the C++ port.

I wonder if the target and compilation mode will be sufficient to fully resolve the ICEs? Some very brief digging through the sources seems to point to the "disambiguator" that rustc is using only depending on the 'metadata' arg. If it is possible to change the output products with different flags without altering the metadata and target flags, maybe that could still lead to issues?

Having only just updated my branch, I'm keen to give that a go for a while to see if any more ICEs crop up, and I'm also rather fond of the regex-based limit on crates to incrementally compile. But if a C++ worker means it becomes viable to include the worker in rules_rust instead of it sitting around in a PR that has to be maintained separately, then that sounds like a net win - and with support for it in tree, I imagine it probably wouldn't be hard to override the C++ worker with a custom Rust one anyway.

@hlopko
Copy link
Member

hlopko commented Apr 1, 2021

Thanks @dae, I really haven't caught up with workers enough.

Bootstrapping with cargo is a problem for our use case, ideally we build everything with Bazel. So if we have to use a worker (see question (1)) I'd still prefer a variant that we can build with a trimmed down rust_binary without extra dependencies.

@dae
Copy link
Contributor

dae commented Apr 1, 2021

Just to clarify, #517 uses the cargo binary that rules_rust downloads when it downloads rustc - it shouldn't require any separate Rust install on the system.

Unfortunately we can't pass incremental=... directly to rustc, as we can't write outside of the sandbox dir which changes each time - we're using the separate worker process not to decrease startup time, but just to be able to write to a persistent location. We could pass that location with worker-extra-flag instead of a custom defined flag, but I'm not sure if that extra flag would be known to the routines in rustc.bzl, and we need some way to know whether the worker code path should be taken or not.

@hlopko
Copy link
Member

hlopko commented Apr 1, 2021

Just to clarify, #517 uses the cargo binary that rules_rust downloads when it downloads rustc - it shouldn't require any separate Rust install on the system.

Yup that's still a problem. We build our own rust toolchain from "scratch". We could build cargo too, but cargo doesn't respect some of the restrictions that our build has. So using cargo is a non-starter.

as we can't write outside of the sandbox dir

I believe you can write to /tmp (and to /dev/shm if it matters), does that help?

We could pass that location with worker-extra-flag instead of a custom defined flag, but I'm not sure if that extra flag would be known to the routines in rustc.bzl, and we need some way to know whether the worker code path should be taken or not.

I see, if there's a logic in rustc.bzl that depends on whether we use worker or not, I guess we need to have a bool_flag such as the experimental-use-worker in this PR. But maybe we still don't need a wrapper?

@dae
Copy link
Contributor

dae commented Apr 1, 2021

Yup that's still a problem. We build our own rust toolchain from "scratch". We could build cargo too, but cargo doesn't respect some of the restrictions that our build has. So using cargo is a non-starter.

Ah, gotcha.

I believe you can write to /tmp

If that's the case, then I think you might be right - we might be able to skip the worker completely. @nikhilm do you foresee any problems with just passing incremental directly to a normal rustc invocation? Perhaps we could use ctx.actions.declare_directory() and pass that in to the incremental arg?

@hlopko
Copy link
Member

hlopko commented Apr 1, 2021

Perhaps we could use ctx.actions.declare_directory() and pass that in to the incremental arg?

Why should we do that? ctx.actions.declare_directory is for an unknown number of action outputs, but what rustc needs is just a directory where it can store it's caches? That directory will not be an output that bazel will track and propagate. If it was so, the directory would be created in the sandbox. But there is no way of telling an action to get the directory from the previous execution, which is what we would need for to make incremental work. So we will use an out of band /tmp directory that bazel will know nothing about.

@dae
Copy link
Contributor

dae commented Apr 1, 2021

Sorry, thinko on my part.

I gave the following proof of concept a try, but found after making a trivial change and compiling again, I reliably run into an internal compiler error. Using --strategy=Rustc=local seems to avoid the ICE, so my guess is the sandboxed paths are creeping into the incremental build products, which using a worker indirectly avoids.

diff --git a/rust/private/rustc.bzl b/rust/private/rustc.bzl
index 1636fdf..01ed62c 100644
--- a/rust/private/rustc.bzl
+++ b/rust/private/rustc.bzl
@@ -573,6 +573,12 @@ def rustc_compile_action(
     else:
         formatted_version = ""

+    incremental_dir = "/tmp/bazel_incremental_{}_{}".format(
+        ctx.var["COMPILATION_MODE"],
+        toolchain.target_triple,
+    )
+    args.add("--codegen", "incremental={}".format(incremental_dir))
+
     ctx.actions.run(
         executable = ctx.executable._process_wrapper,
         inputs = compile_inputs,
thread 'rustc' panicked at 'failed to lookup `SourceFile` in new context', compiler/rustc_middle/src/ty/query/on_disk_cache.rs:710:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.51.0 (2fd73fabe 2021-03-23) running on x86_64-apple-darwin

note: compiler flags: -C opt-level=0 -C debuginfo=0 -C linker=external/local_config_cc/cc_wrapper.sh -C link-args=-lc++ -fobjc-link-runtime -headerpad_max_install_names -no-canonical-prefixes -mmacosx-version-min=11.1 -C incremental --crate-type rlib

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
#0 [generics_of] computing generics of `backend_proto::scheduling_service::Service::run_method`
#1 [crate_variances] computing the variances for items in this crate
end of query stack

@nikhilm
Copy link
Author

nikhilm commented Apr 1, 2021

Wow great work, and I'm sorry for not catching up sooner. I have a couple of questions (these are not review comments, just a high level ramblings :)

  1. If my brief skimming through docs it seems we only need to pass -C incremental=/a/directory to rustc, maybe that can be achieved by https://docs.bazel.build/versions/master/command-line-reference.html#flag--worker_extra_flag?

Are you suggesting using this with the existing process_wrapper? We still need the "worker-ness" to escape the sandbox (otherwise the incremental dir is not shared).

  1. Does a user need to depend on protobuf when they don't use the worker? I'd think yes, because cc_proto_library needs protobuf, and we currently don't depend on it by default in our repositories.bzl file. So if I'm not mistaken, this is a breaking change.

Ah I see what you mean. Because the worker is now specified as a dep of the rust rules, will it be built even when not used? In that case, it would be a breaking change.

  1. If we need a wrapper, what do you think about writing it in Rust, with no dependencies, using json instead of proto, and inlining the parser code in it. I think we could make the bootstrapping work by adding a separate rule that doesn't depend on the worker binary and therefore cannot use workers. I had something like that in mind for our process wrapper already and I'm happy to work on it if you want a helping hand.

That was #421. proto/json doesn't matter as there is the bootstrap problem. If there is a reasonable solution to the bootstrap problem that lands in main, then I can rebase 421 on top to use JSON.

@nikhilm
Copy link
Author

nikhilm commented Apr 1, 2021

The only reason the current worker implementation is a separate binary instead of being a part of process_wrapper.cc is for ease of implementation. I didn't want to dirty process_wrapper until I had something that works. The resulting changes are small enough that process_wrapper itself could act as both worker/non-worker and would save one extra exec() per invocation. I'm happy to merge that.

I am not so sure about always escaping the sandbox via using /tmp. Isn't the entire goal of Bazel to preserve build isolation unless explicitly opted-out by the user?

However, I agree that in rustc's compilation model, the "worker-ness" (a long lived process) has no benefit. So I guess one could use a user-controlled flag just to decide whether to use incremental builds (breaking build isolation) or not.

That would actually be a trivial change to process_wrapper. How does that sound?

dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
@dae
Copy link
Contributor

dae commented Apr 8, 2021

AFAIK workers are not sandboxed by default, so I don't think a direct approach is actually losing anything over what we already had. And sandboxing seems to break rustc in any case, as can be seen above. I spend a fair bit of time with the worker approach, and you've spent even more, so it's a shame to abandon it, but based on hlopko's observations it does seem unnecessary. I've just pushed a new PR that does away with the worker in favour of changing the rustc args directly, and no changes are needed to process_wrapper: #684

dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
vmax pushed a commit to typedb/rules_rust that referenced this pull request Aug 25, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
vmax pushed a commit to typedb/rules_rust that referenced this pull request Aug 25, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants