Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Set CARGO_CHECK environment variable when type checking #3748

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions text/3748-cargo_check_environment_variable.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@akiselev akiselev Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the links! I searched Rust issues and RFCs but forgot to search the cargo repo ☹️

It doesn't look like the discussion has moved much in years, is there a reason against forcing the issue via RFC at this point?

The Rust landscape has changed significantly since the last major activity in the discussion and rust-analyzer seems to be the de-facto LSP implementation at this point. What are the downsides of introducing an LSP specific check command whose only difference (for now) is setting that environment variable? Or the alternative, allowing callers to set a CARGO_NO_BUILD environment variable themselves , one that is officially sanctioned and supported by Rust/Cargo (if not set automatically)?

I understand and appreciate the hesitance to stabilize contracts without ironing out all of the generalities but the discussion about cargo modes and the cargo check && cargo build feels very ivory tower. Rust is increasingly being used to integrate with C++ code beyond *-sys crates as cxx and other tools mature, and I feel like a way to notify build scripts not to run extraneous steps is very much needed regardless of the aforementioned issues. Personally I always configure rust-analyzer to use a subdirectory so that it doesn't block cargo build anyway, so build caching between cargo check and cargo build wouldn't apply (and I'm curious what fraction of the community does too)

That said, I'm biased as I feel acute pain with cxx/cxx-qt, where sccache doesn't seem to help. Worst case scenario I can set the environment variables myself and use a custom fork.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like the discussion has moved much in years, is there a reason against forcing the issue via RFC at this point?

No. While RFC doesn't really force anything until accepted, it is good to open a discussion. Sometimes people hang out in https://internals.rust-lang.org/ first for pre-RFC, before preparing a more formal proposal here.

Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
- Feature Name: `cargo_check_environment_variable`
- Start Date: 2024-12-20
- RFC PR: [rust-lang/rfcs#3748](https://github.com/rust-lang/rfcs/pull/3748)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Add a new environment variable `CARGO_CHECK` that is set to `true` when running `cargo check` or similar type-checking operations so build scripts can skip expensive compilation steps that are unnecessary for Rust type checking, such as compiling external C++ code in cxx based projects.

# Motivation
[motivation]: #motivation

Rust development heavily relies on IDE tooling like rust-analyzer, which frequently invokes `cargo check` to provide real-time type information and diagnostics. Many projects use build scripts (`build.rs`) to generate Rust code and compile external dependencies. For example:

- cxx-rs generates Rust bindings for C++ code and compiles C++ source files
- cxx-qt generates Rust bindings for Qt code and runs the Qt Meta-Object Compiler (MOC)
- Projects using Protocol Buffers generate Rust code from .proto files
- bindgen generates Rust bindings from C/C++ headers

Currently, every time rust-analyzer runs `cargo check`, all build scripts must execute their full build process, including steps like compiling C++ code that are only needed for linking but not for type checking. This significantly impacts IDE responsiveness, especially in projects with complex build scripts.
Copy link
Contributor

@kpreid kpreid Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems missing from this description. In my understanding of how to use Cargo, a well-behaved build script should produce an appropriate set of rerun-if-changed directives, meaning that the build script will not be run “every time” unless its dependencies are actually changed.

There are of course specific scenarios where it will be rerun regardless, such as if you are editing one of the build-dependencies, but this paragraph suggests that build scripts predictably run on every cargo check in every project that has one, not just the first cargo check or cargo build. In my mind, if that’s happening, that indicates a bug in the build script. (Or at least an incompleteness — rerunning every time is the default behavior of Cargo given zero rerun-if-* directives, but for any build script that runs long enough to notice, that should be addressed by adding those directives.)

Can you clarify this? Is the build script in this scenario unable to use rerun-if directives for some reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I miswrote that paragraph. I changed it to Currently, every time rust-analyzer runs cargo check, the build script in the changed crate must execute its full build process, ...

Copy link
Contributor

@kpreid kpreid Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that’s still not true in general. If a build script reruns, then one of the following situations applies:

  • The build script does not have any rerun-if-* dependencies declared; that can and should be fixed.
  • You modified one of its [build-dependencies].
  • You modified a file that is a rerun-if-changed dependency of the build script, so it should rerun.
  • You (implicitly or explicitly) changed a environment variable the build script depends on with rerun-if-env-changed, e.g. by running cargo check from two different environments (IDE and separate terminal). Again, this is an actual change in a dependency, though possibly an unintended one.

Changing the Rust code of a crate does not guarantee that its build script will rerun. Build scripts are rerun only if one of their dependencies changes, or if they failed to declare dependencies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean now.

The cxx crate parses Rust source files to generate bindings from cxx::bridge macros and the Rust source files get added via cargo::rerun-if-changed. Normally this wouldn't be a problem since C/C++ bindings are rarely changed, but in cxx-qt projects the bridge code is the interface between QML/JS GUI code and the Rust backend so it receives a lot more attention while incurring additional QT build tooling overhead.

I'll add a paragraph explaining that detail. I didn't realize the issue I was experiencing is so specific to cxx though


This is particularly important for projects using cxx-qt and similar frameworks where the build scripts perform extensive code generation and compilation.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

When writing a build script (`build.rs`), you can now check the `CARGO_CHECK` environment variable to determine if the build is being performed for type checking purposes:

```rust
fn main() {
generate_rust_bindings();

// Only compile external code when not type checking
if std::env::var("CARGO_CHECK").unwrap() != "true" {
Copy link
Contributor

@kpreid kpreid Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text below says “The environment variable will not be set…” but this code panics if the environment variable is not set.

Suggested change
if std::env::var("CARGO_CHECK").unwrap() != "true" {
if std::env::var("CARGO_CHECK").is_ok() {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you

compile_cpp_code();
}
}
```

This allows build scripts to optimize their behavior based on the build context. When rust-analyzer or a developer runs `cargo check`, the build script can skip time-consuming steps that aren't necessary for type checking.

This feature primarily benefits library authors who maintain build scripts, especially those working with external code generation and compilation. Regular Rust developers using these libraries will automatically benefit from improved IDE performance without needing to modify their code.

# Reference-level explanation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should address things like caching and output generation. Quoting @alexcrichton in rust-lang/cargo#4001 (comment):

I'm not 100% certain this still fits in Cargo myself. The major downside of implementing a feature like this is that cargo check && cargo build gets slower than it currently is. Cargo currently caches build script invocations for those two commands, which means that all the work done by cargo check is reused by cargo build and isn't redone.

[...]

As an author of crates like cc and cmake it's also somewhat ambiguous to me about what "check mode" would do for libraries like that. Should they do nothing? Type-check the C code? Make sure it all compiles? Basically I don't think that C/C++ have any real meaningful distinction like Rust does for cargo check and cargo build, so there's not really an obvious choice of what these library crates would do, which would require even more opt-in or configuration on behalf of all users.

High level proposal: for a build script, CHECK_ONLY should mean:

  1. Do any relevant checks that don't include codegen, if possible
  2. Do what is necessary to check the Rust code
  3. This can always be disregarded in favor of running the full build script

So for C code, CHECK_ONLY would mean doing the C/++ configure (but not build) step and possibly invoking bindgen. Usually this is ./configure or cmake without --build to verify dependencies and environment, and prepare final headers and source files. bindgen can then be run on these final headers. !CHECK_ONLY then means that make, cmake --build, or cc should be run.

OUT_DIR should be the same between check and build so configuration output can be reused. The build script will probably always run the congigure step, cmake knows how to skip a lot if files are already up to date 1.

For crates, cc should totally ignore this environment variable and not change anything - the build script author will just be have the option to skip invoking cc if CHECK_ONLY is set. cmake's Config should add new methods like only_configure (always called) and only_build (called if CHECK_ONLY is unset), since Config::build currently does both these things. cmake::build and Config::build could optionally listen to CHECK_ONLY to determine whether to run the only_build step.

Footnotes

  1. Cmake's configure caching can sometimes be pretty slow for bigger projects. It would be possible for build scripts to thumbprint relevant environment and skip invoking cmake if OUT_DIR has an updated thumbprint, but this would be advanced usage.

[reference-level-explanation]: #reference-level-explanation

Cargo will set the `CARGO_CHECK` environment variable to `true` when running `cargo check`
Copy link
Contributor

@kpreid kpreid Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is unwise to name and describe this as for cargo check in particular, for two reasons:

  • A future-compatibility hazard, especially in context of extending this to other modes as noted in the future possibilities section — are build scripts going to have to check for CARGO_CHECK || CARGO_DOC || CARGO_CLIPPY to be efficient? I think we’d see future arguments over whether commands are “check-like enough to count” rather than introducing more, which would spread confusion and create a tradeoff we can just avoid from the start.
  • Build script authors might decide to intentionally do non-additive things based on the command in use (like generating different bindings code that’s more development-friendly in some way), leading to incorrect caching.

I think it would be better to name and describe as something like:

Cargo will set the CARGO_WILL_NOT_LINK environment variable if Cargo’s build plan is known not to include any linking (and therefore is known not to depend on any foreign object files the build script might produce, or any of its cargo::rustc-link-* directives). Whenever CARGO_WILL_NOT_LINK is present, you may skip producing object files. All generated Rust code or other inputs to the Rust compilation (such as files for include_str! or cargo::rustc-cfg directives) must be unaffected.

The value of CARGO_WILL_NOT_LINK is reserved for making finer distinctions in the future; currently, it is always set to "1", but build scripts should check only for its presence.

This does change the behavior because the definition includes all of cargo check, cargo fix, cargo doc, and cargo clippy, but I think that’s the right thing to do. It’s also very specific about what the build script may do and should not do, to ensure that the results of the build script can still validly be cached.

(I specified the value to be "1" and not "true" because that's what Cargo uses for feature flags, the other existing example of “boolean” environment variables.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great points!

I think your alternative is a better approach, especially not to step on the toes of the parallel cargo modes discussion. I'll wait for more input and then probably update the RFC to the alternative method after the holidays

Copy link
Contributor

@tgross35 tgross35 Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a handful of environment variables that don't have the CARGO_ prefix, e.g. OUT_DIR TARGET NUM_JOBS OPT_LEVEL. My read is that these unprefixed variables are loosely meant to be portable to other build systems that might want to reuse build scripts. Since the proposed feature is certainly something that other build systems could make use of, maybe it should also be unprefixed?

A name like CHECK_ONLY would be reasonably intuitive. WILL_NOT_LINK or SKIP_CODEGEN are a bit more technical but maybe more accurate with e.g. rustdoc. NO_CODEGEN is consistent with the unstable rustc -Zno-codegen. I think Cargo currently gets the check-only functionality by passing --emit=dep-info,metadata.

List of the existing environment variables for reference https://doc.rust-lang.org/cargo/reference/environment-variables.html


The environment variable will not be set (or will be set to `false`) for commands that require full compilation:
- `cargo build`
- `cargo run`
- `cargo test`

# Drawbacks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #3748 (comment): as proposed, this is only beneficial in cases where whatever the build script does is more significant than compiling the build script + deps itself. IME this is indeed the common case since often a crate like cmake or bindgen is needed to generate the headers, actually compiling only adds cmake --build.

[drawbacks]: #drawbacks

1. **Potential for Inconsistencies**: Build scripts might behave differently during type checking vs. full compilation, which could theoretically lead to different type checking results compared to the final build.

2. **Increased Complexity**: Build script authors need to consider an additional factor when determining their behavior, which adds some complexity to the build system. On the other hand, they can ignore the feature entirely and just run all build steps regardless.

3. **Maintenance Burden**: The Rust and Cargo teams will need to maintain this feature and ensure it remains consistent across different commands and contexts.

# Rationale and alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-lang/cargo#10126 proposed CARGO_MODE that adds doc, test, doctest, and a handful of others. Personally I would rather have a single environment variable that says whether or not codegen should happen (as is proposed here) rather than needing to know which modes do and don't require codegen, which isn't robust against adding new modes. Somehow indicating bench/test mode could be useful but this seems better as an independent feature.

[rationale-and-alternatives]: #rationale-alternatives

Alternative designs considered:

1. Define a standard environment variable that isn't set by `cargo check` but is officially encouraged by Rust for RLS and other IDE tooling. This would avoid any unexpected behavior from build scripts with other `cargo check` consumers but still provide a standard way for build scripts to skip unnecessary steps.

2. Do Nothing: If we do nothing, build scripts will continue to run all build steps even when it's not necessary, significantly impacting Rust ergonomics when interfacing with exernal languages.

# Prior art
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind providing link to each prior art so we can find reference easier?

[prior-art]: #prior-art

1. **Go Build Tags**: Go allows conditional compilation using build tags, which can be used to skip certain build steps based on the build context.

2. **Bazel's Configuration Transitions**: Bazel provides mechanisms to modify build behavior based on the target being built.

3. **Cargo Features**: The existing feature flag system in Cargo demonstrates the value of conditional build behavior.

4. **Other Cargo Environment Variables**: Cargo already sets several environment variables during builds:
- `CARGO_CFG_TARGET_OS`
- `CARGO_MANIFEST_DIR`
- `OUT_DIR`

This proposal follows the established pattern of using environment variables to communicate build context to scripts.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

1. Should the environment variable be set for other commands that don't require full compilation?
- `cargo doc`
- `cargo clippy`

2. How should this interact with parallel builds where some targets need full compilation and others only need type checking? (Is this even a thing?)

3. Should we provide additional variables to distinguish between different types of type-checking operations (IDE, clippy, etc.)?

4. How do we ensure build scripts don't diverge too much between type checking and full compilation modes?

# Future possibilities
[future-possibilities]: #future-possibilities

1. **Extended Build Contexts**: Introduce additional environment variables for other build contexts:
- `CARGO_DOC` for documentation generation
- `CARGO_IDE` specifically for IDE tooling