Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Ensure the right cargo subcommands are picked #71

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

javierhonduco
Copy link
Owner

@javierhonduco javierhonduco commented Sep 17, 2024

CI started to fail with errors while running cargo clippy with --ignore-environment. Using this flag clears most of the environment variables, such as the system's $PATH, helping create more isolated builds. We want to avoid using system binaries, which might differ. For this reason, this failure was puzzling to me, most of the environment is under our precise control:

  • The toolchain, such as cargo, rustc and others is defined in our nix flake and we commit the lockfile. No upgrades happened between the last good and the first bad CI runs.
  • Cargo deps are also locked via its lockfile, and there were no upgrades either.
  • We could not reproduce this issue in the 2 boxes we tried.

There are two potential sources of non isolation left, (1) the base image that GitHub Actions runners use and (2) the nix version we run as we don't currently pin them (this will be fixed later on). Unfortunately, even pinning them to a version released before the breakage was introduced didn't fix the problem at hand.

We were not convinced that (1) was the issue was we liberally use --ignore-environment which will prevent the environment getting tainted, but after some tests we saw this:

with a clean environment

$ nix develop --ignore-environment
$ type cargo-clippy
cargo-clippy is /nix/store/vw1g8fjbapmwgsfd5cz3h7gjzjim6xrz-rust-default-1.81.0-nightly-2024-06-20/bin/cargo-clippy
$ type cargo
cargo is /nix/store/vw1g8fjbapmwgsfd5cz3h7gjzjim6xrz-rust-default-1.81.0-nightly-2024-06-20/bin/cargo
$ strace -f cargo clippy --help |& grep cargo-clippy
statx(AT_FDCWD, "/home/javierhonduco/.cargo/bin/cargo-clippy", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0755, stx_size=14881096, ...}) = 0
execve("/home/javierhonduco/.cargo/bin/cargo-clippy", ["/home/javierhonduco/.cargo/bin/c"..., "clippy", "--help"], 0x5555575c7180 /* 113 vars */) = 0
[pid 284009] statx(AT_FDCWD, "/home/javierhonduco/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-clippy", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7fffffff7d90) = -1 ENOENT (No such file or directory)
[pid 284009] execve("/home/javierhonduco/.cargo/bin/cargo-clippy", ["cargo-clippy", "clippy", "--help"], 0x555556080300 /* 117 vars */ <unfinished ...>
[pid 284009] statx(AT_FDCWD, "/home/javierhonduco/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-clippy", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7fffffff7c90) = -1 ENOENT (No such file or directory)

note the wrong cargo-clippy binary (/home/javierhonduco/.cargo/bin/cargo-clippy), provided by the system rather than by nix.

without a clean environment

$ nix develop
$ strace -f cargo clippy --help |& grep cargo-clippy
[...]
statx(AT_FDCWD, "/nix/store/vw1g8fjbapmwgsfd5cz3h7gjzjim6xrz-rust-default-1.81.0-nightly-2024-06-20/bin/cargo-clippy", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0555, stx_size=1453128, ...}) = 0
[...]

which shows the expected path for cargo-clippy.

After some more debugging it was clear that the main difference was in $PATH, as in the non isolated environment both the nix provided binaries were added first, and then there was an entry for /home/javierhonduco/.cargo/bin, which includes the system-provided binaries that are not used by cargo subcommands as the order of the search path is respected.

This didn't seem to make a lot of sense until we found rust-lang/cargo#11023, which has this comment:

If $CARGO_HOME/bin is not in a path, prefer it over anything in $PATH.

So, in summary, since we were cleaning $PATH and $CARGO_HOME/bin was not present anymore, even if $PATH contains all the right cargo subcommand binaries, cargo will fall back to executing binaries in ~/.cargo/bin.

This debugging was complicated by the fact that we could not easily reproduce this issue locally, and that we expected no leakeage from the environment, but we need to be mindful that nix doesn't have a 'real' filesystem sandbox such as buck2 or bazel.

Hence the initial issue was caused due to a different cargo-clippy binary that somehow caused this issue, which is not totally understood at the time:

554 | pub use rstest_macros::fixture;
    |         ^^^^^^^^^^^^^ can't find crate

error[E0463]: can't find crate for `rstest_macros`
    --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstest-0.21.0/src/lib.rs:1357:9
     |
1357 | pub use rstest_macros::rstest;
     |         ^^^^^^^^^^^^^ can't find crate

I don't have to dig deeper, but perhaps an upgrade of our Rust toolchain will show issues. Will keep an eye for this in case there are any bugs lying around.

I verified that there was a base image upgrade around the time the issue started to happen in
actions/runner-images@a68ad81.

Test Plan

CI

CI started to fail with errors while running `cargo clippy` with `--ignore-environment`.
Using this flag clears most of the environment variables, such as the
system's $PATH, helping create more isolated builds. We want to avoid
using system binaries, which might differ. For this reason, this failure
was puzzling to me, most of the environment is under our precise
control:
* The toolchain, such as `cargo`, `rustc` and others is defined in our
  `nix` flake and we commit the lockfile. No upgrades happened between
the last good and the first bad CI runs.
* Cargo deps are also locked via its lockfile, and there were no
  upgrades either.
* We could not reproduce this issue in the 2 boxes we tried.

There are two potential sources of non isolation left, (1) the base
image that GitHub Actions runners use and (2) the `nix` version we run
as we don't currently pin them (this will be fixed later on).
Unfortunately, even pinning them to a version released before the
breakage was introduced didn't fix the problem at hand.

We were not convinced that (1) was the issue was we liberally use
`--ignore-environment` which will prevent the environment getting
tainted, but after some tests we saw this:

**with a clean environment**

```
$ nix develop --ignore-environment
$ type cargo-clippy
cargo-clippy is /nix/store/vw1g8fjbapmwgsfd5cz3h7gjzjim6xrz-rust-default-1.81.0-nightly-2024-06-20/bin/cargo-clippy
$ type cargo
cargo is /nix/store/vw1g8fjbapmwgsfd5cz3h7gjzjim6xrz-rust-default-1.81.0-nightly-2024-06-20/bin/cargo
$ strace -f cargo clippy --help |& grep cargo-clippy
statx(AT_FDCWD, "/home/javierhonduco/.cargo/bin/cargo-clippy", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0755, stx_size=14881096, ...}) = 0
execve("/home/javierhonduco/.cargo/bin/cargo-clippy", ["/home/javierhonduco/.cargo/bin/c"..., "clippy", "--help"], 0x5555575c7180 /* 113 vars */) = 0
[pid 284009] statx(AT_FDCWD, "/home/javierhonduco/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-clippy", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7fffffff7d90) = -1 ENOENT (No such file or directory)
[pid 284009] execve("/home/javierhonduco/.cargo/bin/cargo-clippy", ["cargo-clippy", "clippy", "--help"], 0x555556080300 /* 117 vars */ <unfinished ...>
[pid 284009] statx(AT_FDCWD, "/home/javierhonduco/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-clippy", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7fffffff7c90) = -1 ENOENT (No such file or directory)
```

note the wrong `cargo-clippy` binary (/home/javierhonduco/.cargo/bin/cargo-clippy), provided by the system rather
than by `nix`.

**without a clean environment**

```
$ nix develop
$ strace -f cargo clippy --help |& grep cargo-clippy
[...]
statx(AT_FDCWD, "/nix/store/vw1g8fjbapmwgsfd5cz3h7gjzjim6xrz-rust-default-1.81.0-nightly-2024-06-20/bin/cargo-clippy", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0555, stx_size=1453128, ...}) = 0
[...]
```

which shows the expected path for `cargo-clippy`.

After some more debugging it was clear that the main difference was in
`$PATH`, as in the non isolated environment both the `nix` provided
binaries were added first, and then there was an entry for
`/home/javierhonduco/.cargo/bin`, which includes the system-provided
binaries that _are not_ used by cargo subcommands as the order of the
search path is respected.

This didn't seem to make a lot of sense until we found
https://github.com/rust-lang/cargo/pull/11023/files, which has this
comment:

> If `$CARGO_HOME/bin` is not in a path, prefer it over anything in `$PATH`.

So, in summary, since we were cleaning $PATH and $CARGO_HOME/bin was not
present anymore, even if $PATH contains all the right cargo subcommand
binaries, `cargo` will fall back to executing binaries in ~/.cargo/bin.

This debugging was complicated by the fact that we could not easily
reproduce this issue locally, and that we expected no leakeage from the
environment, but we need to be mindful that `nix` doesn't have a 'real'
filesystem sandbox such as buck2 or bazel.

Hence the initial issue was caused due to a different `cargo-clippy`
binary that somehow caused this issue, which is not totally understood
at the time:

```
554 | pub use rstest_macros::fixture;
    |         ^^^^^^^^^^^^^ can't find crate

error[E0463]: can't find crate for `rstest_macros`
    --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstest-0.21.0/src/lib.rs:1357:9
     |
1357 | pub use rstest_macros::rstest;
     |         ^^^^^^^^^^^^^ can't find crate
```

I don't have to dig deeper, but perhaps an upgrade of our Rust toolchain
will show issues. Will keep an eye for this in case there are any bugs
lying around.

I verified that there was a base image upgrade around the time the issue
started to happen in
actions/runner-images@a68ad81.

Test Plan
=========

CI
@javierhonduco javierhonduco merged commit cb47a2a into main Sep 17, 2024
4 checks passed
@javierhonduco javierhonduco deleted the cargo-subcommand-use-nix-paths branch September 17, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant