Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent worker without binary dependency #517

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

dae
Copy link
Contributor

@dae dae commented Dec 2, 2020

This is an experimental change to #421 that uses cargo to build the worker, instead of relying on an external binary. It is working for me on a Linux, Mac and Windows system here, but I have only given it cursory testing so far. It probably is not suitable for merging as-is due to the hacky way I've added Windows support, but people might like to have a play with it and see how it works in practice.

@google-cla
Copy link

google-cla bot commented Dec 2, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no label Dec 2, 2020
@dae dae mentioned this pull request Dec 2, 2020
@dae
Copy link
Contributor Author

dae commented Dec 2, 2020

RBE is likely failing because use_worker is enabled in examples_deps.bzl - presumably any worker testing code needs to be gated to not run on RBE.

The Windows issue may be due to a locked file - some brief Googling seems to throw up Windows Defender or a flag change as possible culprits, and I notice --enable_runfiles changed on the second run, a bit like
bazelbuild/bazel#9104

@abdnh
Copy link

abdnh commented Dec 8, 2020

@dae
I think the hack to locate the MSVC toolchain fails in Anki and locates c:\msys64\usr\bin (prepended to PATH in bazel.bat) instead.
The build scripts then end up calling the coreutils link utility instead of the MSVC linker (!), which obviously throws a confusing error message: /usr/bin/link: missing operand after '\377\376"'. This completely breaks building on Windows.

@dae
Copy link
Contributor Author

dae commented Dec 8, 2020

The get_linker_env part of 119c478 is supposed to be handling this - it should be extracting the MSVC path and adding it at the top of the path, so that it's found before the msys tools are. It seems to work on two machines here; perhaps adding print(path) before the if path: line will reveal something? Maybe your MSVC path is not at the top of the path as the code is assuming?

You can try it out locally by checking out this repo/branch into c:\rules_rust and then uncommenting this section: https://github.com/ankitects/anki/blob/c505894b88221e6d31ed1b7fdb3a5d835c4de83e/repos.bzl#L41

(if you don't want to debug this, just set use_worker=False in defs.bzl for now)

@abdnh
Copy link

abdnh commented Dec 8, 2020

Maybe your MSVC path is not at the top of the path as the code is assuming?

Exactly. The top of the path contains the IntelliCode path (C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\\Extensions\Microsoft\IntelliCode\CLI), then the MSVC path. This is because I have a typical Visual Studio installation. I just uninstalled IntelliCode (I don't need it anyway) and this solved the issue.

@djmarcin
Copy link
Contributor

djmarcin commented Dec 8, 2020

I've been using this for a few days on Linux and it's working well but I have a few bits of feedback:

  • Is there a way to opt in/out of this on the command line so we can put something in .bazelrc? The workers probably don't make sense for our CI stack, but devs will want to use it.
  • The files in /tmp/rustc-worker-* are not cleaned up when the workers exit. Should they be?

@nikhilm
Copy link

nikhilm commented Dec 11, 2020

I'm not certain, but I suspect there might be a correctness issue with the workers. I'm getting weird compilation failures citing missing symbols for libraries I have removed from the deps portion of certain rules. The emitted gcc line also contains dozens of files like bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.17y6ii2vvcbg5inz.rcgu.o with various alphanumeric strings before .rcgu.o.

Does the worker detect incompatible changes to the inputs and reset its state? Is there a way to force it to drop it's state so I can confirm it's the worker?

Hmm... this is relying on Rust's incremental compilation model to handle correctness. Do you have a specific crate and configuration that someone else can try?

You could try some flags from https://docs.bazel.build/versions/master/persistent-workers.html#modifying-persistent-workers, like worker_sandboxing, worker_verbose and worker_quit_after_build. None of them make it drop state, but the sandboxing + quit should accomplish the same thing.

@djmarcin
Copy link
Contributor

I fixed my immediate issue but I’m not certain what did it. I’m fairly sure it was purging the /temp/rustc-worker* directories and doing a bazel shutdown.

I don’t have exact repro steps, I’ll try to figure out a consistent trigger, but I think it was a combination of removing deps (so they were no longer passed as linker params) and switching git branches (I know this is a problem for cargo because it uses timestamps for change tracking, I don’t know if rustc does the same).

@dae
Copy link
Contributor Author

dae commented Dec 11, 2020

I've updated the docs to mention this code is experimental, and added a warning about possibly needing to purge the files in /tmp. Despite the possible correctness issues it may harbor, I still think this might be a nice thing to have available as an experimental/subject-to-change-or-removal feature - it's opt-in, so should not affect users unless they explicitly enable it, and it can make the edit/compile/test cycle during development somewhat nicer - it brings a 15s compile down to about 6s here.

@dae
Copy link
Contributor Author

dae commented Dec 11, 2020

@djmarcin would be interested to hear if worker_sandboxing makes any difference - since it seems to only alter which input files are available, I wonder if that would help or not. Was your error similar to this one? rust-lang/rust#59535

@djmarcin
Copy link
Contributor

djmarcin commented Dec 11, 2020

@djmarcin would be interested to hear if worker_sandboxing makes any difference - since it seems to only alter which input files are available, I wonder if that would help or not. Was your error similar to this one? rust-lang/rust#59535

It does look like it may be similar to that one. I'll try updating to the latest nightly.

--edit: Confirmed this was caused by bugs in rustc itself

@dae dae marked this pull request as ready for review December 14, 2020 01:38
@dae
Copy link
Contributor Author

dae commented Dec 14, 2020

One thought - maybe instead of placing the rustc-worker files in $TEMP, it might make more sense for the config option to define the path they should be stored, similar to how Bazel's --disk_cache and --repository_cache work - that would make it more obvious that files are being persisted across runs, and make it easier to clean them up as required.

@dae
Copy link
Contributor Author

dae commented Jan 14, 2021

Just an update: I've been running this for the last month and have not run into any issues with fastbuilds, but have seen ICEs and other issues like djmarcin described when doing optimized builds after bumping dependencies, and the last time I hit this, purging the worker folder did not solve the problem - I suspect a bazel clean would have been required. Since incremental compilation is something I only want during development, I've just got my local repo set up to only use the worker in development builds, and that has been running well.

Base automatically changed from master to main January 28, 2021 20:47
@dae
Copy link
Contributor Author

dae commented Jan 31, 2021

I ran into an ICE again while running in fastbuild mode, because a build script was being compiled with optimizations. I use --disk-cache with Bazel, and found I had to purge it to fix the issue - removing the rustc-worker folders and using 'bazel clean -expunge' did not help.

@djmarcin
Copy link
Contributor

djmarcin commented Feb 3, 2021

I ran into an ICE again while running in fastbuild mode, because a build script was being compiled with optimizations. I use --disk-cache with Bazel, and found I had to purge it to fix the issue - removing the rustc-worker folders and using 'bazel clean -expunge' did not help.

I think this would happen with any ICE though, because bazel will cache the failure for those inputs, so I don't think it's a blocker to merging this PR. We've been using this since December and it has been very stable for us.

@djmarcin
Copy link
Contributor

djmarcin commented Mar 22, 2021

We recently switched our toolchain to use lld as the linker and started getting errors periodically again. They have a similar form as mentioned up thread:

INFO: From Compiling Rust bin build_script_script_ (1 files):
--
  | error: linking with `external/llvm_toolchain/bin/clang` failed: exit code: 1
  | \|
  | = note: "external/llvm_toolchain/bin/clang" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" "-Wl,--eh-frame-hdr" "-L" "/var/lib/buildkite-agent/.cache/bazel/buildkite-6/external/rust_linux_x86_64/lib/rustlib/x86_64-unknown-linux-gnu/lib" "bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.112ggnxtq4t8qxbg.rcgu.o" "bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.119jd99epnwj8gj8.rcgu.o" "bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.124njg78r3iwtus6.rcgu.o" "bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.1e4vy0ssuqw1a7kn.rcgu.o" "bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.1fv6sp7w6ldlj1l.rcgu.o" "bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.1gdn5tne3pgg1dku.rcgu.o" "bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.1h6we3wd1ff3fhe6.rcgu.o" "bazel-out/k8-opt-exec-
<snip dozens more lines like this>
"/var/lib/buildkite-agent/.cache/bazel/buildkite-6/execroot/repo/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/raze__heck__0_3_2/libheck--998291120.rlib" 
<snip many other lines like this>
= note: ld.lld: error: undefined hidden symbol: alloc::sync::Arc$LT$T$GT$::drop_slow::h9b54a0d3c02b9ef9
  | >>> referenced by 119jd99epnwj8gj8
  | >>>               bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.119jd99epnwj8gj8.rcgu.o:(core::ptr::drop_in_place$LT$core..option..Option$LT$alloc..sync..Arc$LT$tokio..sync..oneshot..Inner$LT$$LP$$RP$$GT$$GT$$GT$$GT$::he3876c1fe267b6f5)
  | >>> referenced by 119jd99epnwj8gj8
  | >>>               bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.119jd99epnwj8gj8.rcgu.o:(core::ptr::drop_in_place$LT$tokio..runtime..blocking..pool..BlockingPool$GT$::hda2ad28086fdca0a)
  | >>> referenced by 119jd99epnwj8gj8
  | >>>               bazel-out/k8-opt-exec-2B5CBBC6/bin/common/build_script_script_.119jd99epnwj8gj8.rcgu.o:(core::ptr::drop_in_place$LT$tokio..runtime..blocking..shutdown..Receiver$GT$::hc2867cb9e355470f)
 <snip more errors like this>

This seems like some kind of stale object file issue, and we are already using --worker_quit_after_build on CI so I decided to try adding --worker_sandboxing as mentioned above. However, that led to two issues

  1. On dev machines, which do not set --worker_quit_after_build, we now have 450 processes like $HOME/.cache/bazel/_bazel_user/71ed060a30795ab0736b520a305b19cd/bazel-workers/worker-898-Rustc/repo/bazel-out/host/bin/external/rules_rust/worker/rustc-worker bazel-out/k8-opt-exec-2B5CBBC6/bin/external/rules_rust/util/process_wrapper/process_wrapper external/rust_linux_x86_64/bin/rustc fastbuild --persistent_worker. I'm not sure if this is sandboxing working as intended, or an interaction with the rustc worker in particular that leaves hundreds of running processes around. I'm also not sure if this is really harming anything.

  2. An ICE that appears related to some state inside the worker around the location of files:

thread 'rustc' panicked at 'failed to lookup `SourceFile` in new context', compiler/rustc_middle/src/ty/query/on_disk_cache.rs:710:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

The reason this seems worker related to me is that repeated invocations of bazel build continue to fail, but if I rm -rf /tmp/rustc-worker* then the next build succeeds. This is also deterministic -- after cleaning and rebuilding, there will be an ICE at the same place. So it seems like maybe the rustc worker is persisting some (unnecessary?) state that causes it to be incompatible with --worker_sandboxing? It seems like the worker is mostly just delegating to rustc -- is this an error that needs to be dealt with upstream?

@nikhilm
Copy link

nikhilm commented Mar 23, 2021

For the ICE in the compiler, it may be worth finding a reduced test case that fails outside your infrastructure and filing it as a rustc bug.

As for having over 400 processes, it is possible there is some bug where the worker is not quitting, or perhaps Bazel determines your machines have enough capacity to run these processes? I'm not sure what heuristics Bazel uses.

@dae
Copy link
Contributor Author

dae commented Mar 27, 2021

Minor changes were required to get this going with the latest rules_rust - they're available here (see parents) if anyone needs them.

@dae
Copy link
Contributor Author

dae commented Mar 30, 2021

I ran into more linking issues, this time in debug build, and it motivated me to dig into this a bit more. Since rustc doesn't seem to be properly splitting the output based on things like architecture, I've changed the worker code to generate a hash for each request, and include things like target architecture in the hash, so we no longer need to rely on rustc doing the correct thing. Hopefully that will solve the ICEs.

I've also added an experimental option to limit the incremental output to specific crates with a regex, so that disk space is not used up on infrequently changing third-party crates.

dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
vmax pushed a commit to typedb/rules_rust that referenced this pull request Aug 25, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
vmax pushed a commit to typedb/rules_rust that referenced this pull request Aug 25, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants