Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent worker support #421

Closed
wants to merge 25 commits into from
Closed

Conversation

nikhilm
Copy link

@nikhilm nikhilm commented Oct 3, 2020

@mfarrugi
This is the outline of what it would look like to plug in the worker. There are a few thorny issues that need solutions:

  1. The need to have raze emit _no_worker_ versions of the rules for all dependencies of the worker. This will need changes to raze to allow it to emit custom rule names and imports.

  2. I tried this with a project that uses Raze+rules_rust itself and quickly ran into the shared deps problem. Not only does the user of these rules currently need to run the (documented) extra steps to merge their own deps and the protobuf deps, they also need to pull in the protobuf patch (could be mitigated by running the build.rs?) and patch their own copies of the generated BUILD files to use the _no_worker_ rules. This is a significant barrier to usage. Of course, if raze was modified, there is probably a way to annotate it instead, so that it generates no-worker variants for certain crates and their transitive deps. Ideally it should generate both worker and no-worker variants and then use the appropriate ones, so that any deps shared between the worker and user code (e.g. protobuf) can still use the with-worker variant to build the protobuf crate for the user code.

  3. The worker actions will need a new mnemonic, otherwise changing the strategy for Rustc will make the worker compilation fail, since it uses the same actions, but does not support the worker strategy. This is an easy fix.

At this point, it is almost like letting the worker build with Cargo, releasing binaries and then having external repositories for the worker, the way we do for rustc itself may be another alternative. What do you think?

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@nikhilm
Copy link
Author

nikhilm commented Oct 5, 2020

Related to #412

@mfarrugi
Copy link
Collaborator

mfarrugi commented Oct 7, 2020

At this point, it is almost like letting the worker build with Cargo, releasing binaries and then having external repositories for the worker, the way we do for rustc itself may be another alternative. What do you think?

That sounds fine, cargo-raze is a much closer analog. We're not currently fetching raze as part of rules_rust, but we probably should.

@google-cla
Copy link

google-cla bot commented Oct 7, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: yes and removed cla: no labels Oct 7, 2020
@google-cla
Copy link

google-cla bot commented Oct 7, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Oct 7, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

1 similar comment
@google-cla
Copy link

google-cla bot commented Oct 7, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@dae
Copy link
Contributor

dae commented Oct 20, 2020

@nikhilm unfortunately I'm fairly new to Bazel and thus do not have anything to add on how the problems you listed may be resolved, but I just wanted to say thank you for working on this - waiting ~5 secs for my crate to compile in the trivial change case instead of ~15 secs is a lot nicer. In case it's useful, here is your branch rebased over the latest master: https://github.com/ankitects/rules_rust/tree/persistentworker

Edit: a commit that adds macOS support is available here: https://github.com/ankitects/rules_rust/tree/macos-worker

dae added a commit to ankitects/rules_rust that referenced this pull request Oct 20, 2020
I have a crate that uses pyo3 to create a Python extension module.
On macOS, special link flags are required:

    rustc_flags = selects.with_or({
        (
            "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin",
        ): [
            "-Clink-arg=-undefined",
            "-Clink-arg=dynamic_lookup",
        ],
        "//conditions:default": [],
    }),

Without them, the linker on macOS fails, as the Python API is not
available until runtime.

rules_rust's clippy implementation was passing --emit=dep-info,link
to the clippy invocation, causing a clippy run to fail with linking
errors. This patch changes the invocation to use
--emit=dep-info,metadata instead, which is what cargo uses. It also
shaves a bit of time off the check, as linking no longer needs to happen.

Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect
seems to be falling back on a non-worker compile, so it's still
noticeably slower than running cargo clippy directly when minor
changes have been made.
dae added a commit to ankitects/rules_rust that referenced this pull request Oct 21, 2020
I have a crate that uses pyo3 to create a Python extension module.
On macOS, special link flags are required:

    rustc_flags = selects.with_or({
        (
            "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin",
        ): [
            "-Clink-arg=-undefined",
            "-Clink-arg=dynamic_lookup",
        ],
        "//conditions:default": [],
    }),

Without them, the linker on macOS fails, as the Python API is not
available until runtime.

rules_rust's clippy implementation was passing --emit=dep-info,link
to the clippy invocation, causing a clippy run to fail with linking
errors. This patch changes the invocation to use
--emit=dep-info,metadata instead, which is what cargo uses. It also
shaves a bit of time off the check, as linking no longer needs to happen.

Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect
seems to be falling back on a non-worker compile, so it's still
noticeably slower than running cargo clippy directly when minor
changes have been made.
dae added a commit to ankitects/rules_rust that referenced this pull request Oct 22, 2020
I have a crate that uses pyo3 to create a Python extension module.
On macOS, special link flags are required:

    rustc_flags = selects.with_or({
        (
            "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin",
        ): [
            "-Clink-arg=-undefined",
            "-Clink-arg=dynamic_lookup",
        ],
        "//conditions:default": [],
    }),

Without them, the linker on macOS fails, as the Python API is not
available until runtime.

rules_rust's clippy implementation was passing --emit=dep-info,link
to the clippy invocation, causing a clippy run to fail with linking
errors. This patch changes the invocation to use
--emit=dep-info,metadata instead, which is what cargo uses. It also
shaves a bit of time off the check, as linking no longer needs to happen.

Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect
seems to be falling back on a non-worker compile, so it's still
noticeably slower than running cargo clippy directly when minor
changes have been made.
@google-cla
Copy link

google-cla bot commented Oct 22, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

2 similar comments
@google-cla
Copy link

google-cla bot commented Oct 22, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Oct 22, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

dae added a commit to ankitects/rules_rust that referenced this pull request Nov 13, 2020
I have a crate that uses pyo3 to create a Python extension module.
On macOS, special link flags are required:

    rustc_flags = selects.with_or({
        (
            "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin",
        ): [
            "-Clink-arg=-undefined",
            "-Clink-arg=dynamic_lookup",
        ],
        "//conditions:default": [],
    }),

Without them, the linker on macOS fails, as the Python API is not
available until runtime.

rules_rust's clippy implementation was passing --emit=dep-info,link
to the clippy invocation, causing a clippy run to fail with linking
errors. This patch changes the invocation to use
--emit=dep-info,metadata instead, which is what cargo uses. It also
shaves a bit of time off the check, as linking no longer needs to happen.

Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect
seems to be falling back on a non-worker compile, so it's still
noticeably slower than running cargo clippy directly when minor
changes have been made.
@google-cla
Copy link

google-cla bot commented Nov 14, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@nikhilm
Copy link
Author

nikhilm commented Nov 24, 2020

@mfarrugi when you have some time could you merge this? Thanks!


Iterating on Rust code may benefit from [incremental compilation](https://doc.rust-lang.org/edition-guide/rust-2018/the-compiler/incremental-compilation-for-faster-compiles.html). This is supported by using a [Bazel Persistent Worker](https://docs.bazel.build/versions/master/persistent-workers.html). While Bazel can determine what needs to be rebuilt at a crate level, the compiler can speed up building a single crate by sharing information across runs. It does this by storing intermediate information in a directory across invocations. This is enabled by default in Cargo. Persistent workers bring this feature to Bazel.

The Rust rules have preliminary support for workers. Pass `use_worker = True` to enable this when available.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is linux only, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted a Mac version above - but each platform would need its own binary at the moment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustc-worker also has a mac build available now. I can update this PR with an entry for that.

The Rust rules have preliminary support for workers. Pass `use_worker = True` to enable this when available.

```python
rust_repositories(use_worker = True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this shouldn't be the default, was there a reason to be conservative about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pulling in and running a binary hosted on a third-party repo, so off by default is probably the best option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what @dae said. Plus given that only 2(?) people have actually used it, it seemed very alpha.

1. Since the worker has dependencies on various crates, it uses cargo-raze, which generates relevant rules. This means "don't use the worker to build this target" is transitive and such information needs to be propagated down the tree in a way that works with restrictions in Bazel. Initial experiments repeatedly encountered failures due to Bazel treating the rule dependency on the worker executable target as a cycle, even when building in non-worker mode. This may be user error or a restriction in Bazel. Until that is addressed, the easiest way to fix this is to change cargo-raze to customize what rules are used, and provide `rust_no_worker_binary` and `rust_no_worker_library` rules that do not have this cycle.
2. Figuring out a good strategy for dependencies. Since Bazel doesn't really have transitive dependencies, attempts to build this worker from source necessarily require users of these rules to register all the external repositories for the worker in their `WORKSPACE`. This could cause collisions with other dependencies. In addition, if the `_no_worker_` approach above is adopted, users will lose the benefits of workers for those dependencies (like `protobuf`) shared between the worker and their code. There is no satisfactory solution for this right now.

## How about rewriting the worker in C++?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the worker benefit from being written in a non-gc'd language? Curious about whether python is a reasonable choice, given other utilities in rules_rust have been written in python.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is nothing preventing writing this in Python for someone with such an interest. I did not find any references to Python rules in this repo. I suppose as a bootstrap language (i.e. one that already has Bazel rules for it), that is a better alternative than C++.

@mfarrugi
Copy link
Collaborator

I was really hoping for a second set of eyes to at least take a high-level look. In particular I'm not a huge fan of depending on a linux-only binary download, and while ultimately I think that is fine I can imagine other opinions.

Does anyone care to opine on default-ness or any of the other high level aspects of this addition? cc @dfreese @illicitonion @UebelAndre

@dae
Copy link
Contributor

dae commented Nov 30, 2020

Avoiding the need for a binary would be ideal. Python could work, though I don't see any Python in rules_rust at the moment?

Alternatively perhaps the cyclic dependency issue could be solved by having a separate rust_binary_noworker() rule that does not take the worker toolchain as an argument.

Protobuf dependencies could be avoided by using JSON: https://blog.bazel.build/2020/11/11/json-workers.html

dae added a commit to ankitects/rules_rust that referenced this pull request Dec 1, 2020
I have a crate that uses pyo3 to create a Python extension module.
On macOS, special link flags are required:

    rustc_flags = selects.with_or({
        (
            "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin",
        ): [
            "-Clink-arg=-undefined",
            "-Clink-arg=dynamic_lookup",
        ],
        "//conditions:default": [],
    }),

Without them, the linker on macOS fails, as the Python API is not
available until runtime.

rules_rust's clippy implementation was passing --emit=dep-info,link
to the clippy invocation, causing a clippy run to fail with linking
errors. This patch changes the invocation to use
--emit=dep-info,metadata instead, which is what cargo uses. It also
shaves a bit of time off the check, as linking no longer needs to happen.

Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect
seems to be falling back on a non-worker compile, so it's still
noticeably slower than running cargo clippy directly when minor
changes have been made.
@nikhilm
Copy link
Author

nikhilm commented Dec 2, 2020

Alternatively perhaps the cyclic dependency issue could be solved by having a separate rust_binary_noworker() rule that does not take the worker toolchain as an argument.

This did not work out due to the requirement that pops up about not being able to have any cargo-raze based dependencies, without significant changes to cargo-raze itself.

Note that JSON workers are a very new thing at this point, which would necessarily tie rules_rust to Bazel 3.7 or newer. I'm not sure if that is what we want for rules_rust. Perhaps the use_worker flag should still be preserved in such a case.

Re: writing this in Python, or modifying raze - I don't have the interest to do those changes, due to incidental complexity (Bazel transition and cycle-checker limitations, modifying another dependency etc.) or choice of language. I appreciate the feedback and encouragement from everyone here. If this is not the best way to proceed, I am OK with closing this PR without merging, but I'm not going to pursue these alternative approaches.

@dae
Copy link
Contributor

dae commented Dec 2, 2020

I've posted an experiment using cargo on #517

Base automatically changed from master to main January 28, 2021 20:47
@nikhilm nikhilm closed this Mar 31, 2021
@UebelAndre
Copy link
Collaborator

I actually feel this might be worth keeping around, I'm investigating some ways rust binaries can be bootstrapped as part of #650 and would like to get something basic setup quickly. Maybe the same infrastructure could be applied here?

But I'm also not against a C++ implementation, I'm just more excited about a Rust implementation 😄

dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 8, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

It also requires targets to be tagged "incremental" - any without the
tag will be compiled normally, even if a base folder is provided. Cargo's
behaviour appears to be to only store incremental products for the
product being built (eg, all the dependencies you pull in from crates.io
do not have incremental build products stored), so manual tagging seems
preferable to an "include everything" approach.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
dae added a commit to ankitects/rules_rust that referenced this pull request Apr 9, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
vmax pushed a commit to typedb/rules_rust that referenced this pull request Aug 25, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
vmax pushed a commit to typedb/rules_rust that referenced this pull request Aug 25, 2021
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants