Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Migrate all Rust packages to importCargoLock #217084

Closed
wants to merge 1 commit into from

Conversation

winterqt
Copy link
Member

@winterqt winterqt commented Feb 19, 2023

This PR proposes converting all Rust packages in Nixpkgs to using importCargoLock instead of fetchCargoTarball whenever possible, which parses the lock file to generate independent FODs, per dependency, using the hashes from it.

Why now?

Alternatively: "why isn't this a proper RFC?"

On March 9th, 2023, Rust 1.68.0 will be released. With it, Cargo will introduce a change that will break our FODs for packages that use Git dependencies.

We just don't have time to go through the RFC process, unless we want to fix all affected hashes.

@alyssais thought that this oppertunity was perfect to just vendor the lockfiles, and after some discussion, I agree.

Benefits

  1. We don't have to download the ~170 MiB registry each time we want to fetch a package's dependencies.
  2. We can share crates between packages in the Nix store, as the resulting vendor directories just contain symlinks.
  3. We can eventually do fancy things such as automatically include required libraries for crates into package's build inputs, or per-crate build artifact caching, which -- until now -- would require IFD.

Drawbacks

  1. Repo size inflation. The PoC conversion adds ~32.6 MiB to the repo's size. I believe this is justifiable because of the above space-saving changes.
  2. Eval performance will naturally decrease. However, not by a lot:
stat before after Δ Δ%
cpuTime 73.86 74.78 ↗ 0.92 1.24%
envs-bytes 2,999,282,224 3,040,232,472 ↗ 40,950,248 1.37%
envs-elements 162,064,778 164,196,713 ↗ 2,131,935 1.32%
envs-number 106,422,750 107,916,173 ↗ 1,493,423 1.40%
gc-heapSize 9,681,682,432 9,883,009,024 ↗ 201,326,592 2.08%
gc-totalBytes 17,666,947,056 17,862,112,352 ↗ 195,165,296 1.10%
list-bytes 429,334,344 435,001,192 ↗ 5,666,848 1.32%
list-concats 2,021,476 2,186,700 ↗ 165,224 8.17%
list-elements 53,666,793 54,375,149 ↗ 708,356 1.32%
nrAvoided 148,568,027 150,413,380 ↗ 1,845,353 1.24%
nrFunctionCalls 88,721,432 90,120,887 ↗ 1,399,455 1.58%
nrLookups 45,592,738 46,857,868 ↗ 1,265,130 2.77%
nrOpUpdateValuesCopied 248,764,456 250,934,019 ↗ 2,169,563 0.87%
nrOpUpdates 10,805,375 10,967,351 ↗ 161,976 1.50%
nrPrimOpCalls 70,154,155 70,923,773 ↗ 769,618 1.10%
nrThunks 149,433,291 151,489,250 ↗ 2,055,959 1.38%
sets-bytes 5,943,511,488 5,992,796,432 ↗ 49,284,944 0.83%
sets-elements 341,296,918 344,142,502 ↗ 2,845,584 0.83%
sets-number 30,172,550 30,407,275 ↗ 234,725 0.78%
sizes-Attr 16 16 0
sizes-Bindings 16 16 0
sizes-Env 16 16 0
sizes-Value 24 24 0
symbols-bytes 1,828,318 1,842,339 ↗ 14,021 0.77%
symbols-number 146,044 146,738 ↗ 694 0.48%
values-bytes 4,585,180,872 4,639,916,904 ↗ 54,736,032 1.19%
values-number 191,049,203 193,329,871 ↗ 2,280,668 1.19%
# pre-conversion:
________________________________________________________
Executed in   86.59 secs    fish           external
   usr time   73.89 secs    0.27 millis   73.89 secs
   sys time    7.61 secs    2.88 millis    7.61 secs

# post-conversion:
________________________________________________________
Executed in   87.57 secs    fish           external
   usr time   74.85 secs   55.14 millis   74.80 secs
   sys time    7.38 secs    7.21 millis    7.37 secs

What about other languages?

Alternatively: "you maintain a FOD-based builder, are you going to push vendoring their lockfiles?"

Rust's lockfile format is uniquely suitable to this type of vendoring because they turn out to be pretty small (unlike npm or Yarn's), and we can use the hashes from them directly (unlike Go's).

All in all, this PR merely exists as a stopgap until computed derivations are in a state where we can use them. Once they are, they will obviously be superior to anything vendoring can do.

Assuming the consensus is that we want to move forward with this, I will update this PR to include my PoC branch, which is where you can take a look at what this will end up looking like. This branch converts 879 packages, which were all done with the script I've added in this PR. When the actual conversion happens, I'll go through and do the rest manually. I also need to update documentation.

I appreciate any and all thoughts on this, thanks!

@winterqt
Copy link
Member Author

cc @NixOS/rust

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/migrate-all-rust-packages-to-importcargolock/25582/1

@alyssais
Copy link
Member

Another benefit of doing this is that it makes it easier for us to do things like upgrade a particular Rust dependency across the board in Nixpkgs if a vulnerability is found. (This would be a lot more feasible to do for Rust than for other language ecosystems because so much is checked at compile time with Rust, and generally Rust libraries are good at not making other breaking changes in security updates.) We could technically do this currently too, but it's too hard. It requires some fiddly scripting, and actually updating a dependency for a package is quite annoying — we'd have to either commit the lockfile anyway, or generate a patch and update the cargoSha256. In practice this doesn't happen.

We don't have to download the ~170 MiB registry each time we want to fetch a package's dependencies.

I also want to highlight how important this one is. If you're on a plane/train, or more importantly if you're in a part of the world where internet is slow or expensive, this can make it infeasible to build Rust packages from Nixpkgs for no real reason — each individual crate is probably <1MB to download, but the Cargo FOD fetcher as currently implemented has to download the whole huge registry every time it's used. (It would possibly be possible for us to fix this, by having a separate derivation for the crate registry, that could be shared between invocations of the FOD fetcher, but it would be yet another layer on the Jenga tower of the FOD system, and it wouldn't be simple to implement — we'd need to update it extremely frequently so that it was likely to have whatever crates somebody might be using indexed at the time they wanted to use it.)

With it, Cargo will introduce a change that will break our FODs for packages that use Git dependencies.

This isn't an isolated incident — it has happened repeatedly over the last few years. Every time it requires a lot of work to fix (we're lucky to even have advance notice this time), and there's nothing we can do for out-of-tree users. For other abuses of the FOD mechanism, we don't really have good alternatives (at least for now), but as Winter points out, Cargo is designed basically perfectly so that we don't need to.

@sternenseemann
Copy link
Member

Another benefit of doing this is that it makes it easier for us to do things like upgrade a particular Rust dependency across the board in Nixpkgs if a vulnerability is found. (This would be a lot more feasible to do for Rust than for other language ecosystems because so much is checked at compile time with Rust, and generally Rust libraries are good at not making other breaking changes in security updates.) We could technically do this currently too, but it's too hard. It requires some fiddly scripting, and actually updating a dependency for a package is quite annoying — we'd have to either commit the lockfile anyway, or generate a patch and update the cargoSha256. In practice this doesn't happen.

Note that I've also written CI tooling for checking in tree lock files which could easily be adapted for nixpkgs. The benefit is that it'll no longer require downloading the amount of sources it currently does, but work just on the local checkout of nixpkgs. Maybe we could then make this into a proper reporting tool of sorts (maybe even in CI?).

String::from_utf8(
Command::new("nix-build")
.arg("-A")
.arg(format!("{attr}.src"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this doesn't respect cargoPatches and can break builds due to mismatch in Cargo.lock (example), but this can probably be fixed manually and not be included in the script

$ rg 'cargoPatches =' | wc -l
50

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, will make it skip those and rerun. I naively assumed/forgot that my check of "is there a vendored Cargo.lock already" would(n't) be sufficient.

@figsoda
Copy link
Member

figsoda commented Feb 19, 2023

I am not going to repeat the benefits of this change, overall this is a 👍 for me. I will just link a few related PRs

Cargo will introduce a change that will break our FODs for packages that use Git dependencies.

Unrelated to this proposal, but we should probably backport this change to importCargoLock, as this is currently blocking updates of ruff and probably more packages in the future

Copy link
Contributor

@asymmetric asymmetric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be a user-visible change in policy? If so, we should document it in the manual.

@@ -0,0 +1,4 @@
#!/usr/bin/env nix-shell
#!nix-shell -I nixpkgs=. -i bash -p "import ./maintainers/scripts/convert-to-import-cargo-lock" nix-prefetch-git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could make sense to add a link to this PR here or in the Rust source, IMO

@winterqt
Copy link
Member Author

Is this going to be a user-visible change in policy? If so, we should document it in the manual.

Yes, and I noted this in the proposal:

I also need to update documentation.

(Admittedly, it was buried at the end, so I don't blame you for missing it.)

Comment on lines +116 to +123
if let Some(hash) = hashes.get(original_url) {
git_dependencies.push((
format!("{}-{}", package.name, package.version),
hash.clone(),
));

continue;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to deduplicate at the local level as well, as importCargoLock instructs users to use the first dependency per shared repo -- we currently add duplicate outputHashes entries for no good reason.

@Mic92
Copy link
Member

Mic92 commented Feb 19, 2023

cc @ryantm as this will also break automatic updates in ryantm-r...

@Mic92
Copy link
Member

Mic92 commented Feb 19, 2023

Will the upgrade break our current checksums or will it render the old vendor mechanism useless?

@winterqt
Copy link
Member Author

cc @ryantm as this will also break automatic updates in ryantm-r...

Was gonna poke him if the overall opinion was largely positive.

Will just break our current checksums or will it render the old vendor mechanism useless?

The Cargo change? It'll only invalidate some hashes, it won't break the mechanism entirely.

@winterqt
Copy link
Member Author

Speaking of, we should determine some sort of size threshold that warrants not vendoring the lock file... 1 MiB? 5 MiB?

@alyssais
Copy link
Member

Speaking of, we should determine some sort of size threshold that warrants not vendoring the lock file... 1 MiB? 5 MiB?

Are there any packages we know of for which this might be an issue? (Also compressed size is probably what matters — not sure how far we want to do down ease-of-check vs accuracy.)

@winterqt
Copy link
Member Author

Cargo will introduce a change that will break our FODs for packages that use Git dependencies.

Unrelated to this proposal, but we should probably backport this change to importCargoLock, as this is currently blocking updates of ruff and probably more packages in the future

Even better, let's do it before them 😉

Speaking of, we should determine some sort of size threshold that warrants not vendoring the lock file... 1 MiB? 5 MiB?

Are there any packages we know of for which this might be an issue?

I'm unsure off the top of my head, though I'm sure we can run the numbers on the PoC branch.

(Also compressed size is probably what matters — not sure how far we want to do down ease-of-check vs accuracy.)

What do you mean by the latter, considering everything is losslessly compressed?

@alyssais
Copy link
Member

(Also compressed size is probably what matters — not sure how far we want to do down ease-of-check vs accuracy.)

What do you mean by the latter, considering everything is losslessly compressed?

When considering whether a lockfile is too big, the most important thing to configure is how much size it permanently adds to the repo, which would be the size after being gzipped and maybe included in a packfile, rather than the size it temporarily adds to a checkout. (Hopefully Cargo.lock files will compress extremely well when packed together, as there'll be a lot of duplication between them.)

@Mic92
Copy link
Member

Mic92 commented Feb 20, 2023

When considering whether a lockfile is too big, the most important thing to configure is how much size it permanently adds to the repo, which would be the size after being gzipped and maybe included in a packfile, rather than the size it temporarily adds to a checkout. (Hopefully Cargo.lock files will compress extremely well when packed together, as there'll be a lot of duplication between them.)

Still git will considerable becomes significant slower on every action if you add large text files to it, so we should still think about a limit even if its just for yarn2nix/npm2nix lock files where large dependency trees are common.

@figsoda
Copy link
Member

figsoda commented Mar 6, 2023

I skimmed through the recording, seems like the size increase is the biggest drawback that was brought up. A few things I want to explore/clear up:

The 32.6MiB shouldn't be as big of impact on the tarball size, as it was mentioned since this number is uncompressed, and there are quite a lot of repetition like source = "registry+https://github.com/rust-lang/crates.io-index" that should be quite efficiently compressed.

Another thing we can look into is minifying the lock files, in addition to the aforementioned source = ...s, there are things like dependencies that can be removed from these lock files
for example, an entry like this

[[package]]
name = "aes"
version = "0.7.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e8b47f52ea9bae42228d07ec09eb676433d7c4ed1ebdf0f1d1c29ed446f1ab8"
dependencies = [
 "cfg-if",
 "cipher",
 "cpufeatures",
 "opaque-debug",
]

can ideally be minified to something like this

[[p]]
n="aes"
v="0.7.5"
h="9e8b47f52ea9bae42228d07ec09eb676433d7c4ed1ebdf0f1d1c29ed446f1ab8"

which is a >60% decrease in size

Another thing to consider is that the vendor tarballs fetchTarball creates are large compared to just symlinks which importCargoLock does, and the process of obtaining the hash is rather time consuming.

@ghost
Copy link

ghost commented Mar 7, 2023

I'm very much in favor of this PR.

as it was mentioned since this number is uncompressed

Indeed. The git history (which is what really matters) grows by less than 19MB.
$ git clone -b import-cargo-lock-migration-poc https://github.com/winterqt/nixpkgs
$ git -C nixpkgs reset --hard a3c24f953573af4e6964f2f0099a34a9dc4ba236
$ mkdir clean-clone
$ cd clean-clone
$ git -C ../nixpkgs rev-parse HEAD
a3c24f953573af4e6964f2f0099a34a9dc4ba236
$ git clone --bare --single-branch --no-local ../nixpkgs
$ cd nixpkgs.git/
$ git gc
$ du -sk .
2715740 .
$ git -C ../nixpkgs rev-parse import-cargo-lock-migration-poc
b2045f8adb5b836eaee7740d310e6c15a00eda24
$ git fetch ../nixpkgs import-cargo-lock-migration-poc
$ git gc
$ du -sk .
2734548 .
$ dc
10k
2734548 2715740-f
18808

Cargo.lock encodes the same information as haskell-packages.nix (12MB!). Can we not do for Rust what we already do for Haskell? Lua (rocks) does the same thing, but for a much smaller ecosystem. I'm sure there are other examples. If TOML offends people, vendor nix eval 'builtins.parseTOML Cargo.lock' instead.

Not having IFD-free eval-time access to the contents of Cargo.lock is a huge source of pain. Right now we can't override buildRustPackage dependencies without resorting to ugly hacks like cargo update --offline.

Nixpkgs is supposed to contain copies of the dependency graphs of its packages, in eval-time-manipulatable form. Usually that means hand-transcription into Nix source code. For buildRustPackage it means Cargo.tomlCargo.lock.

@figsoda
Copy link
Member

figsoda commented Mar 9, 2023

Rust 1.68.0 is released with the aforementioned change to Cargo

@RaitoBezarius RaitoBezarius mentioned this pull request Mar 9, 2023
12 tasks
@Profpatsch
Copy link
Member

This is a great improvement!

@NickCao NickCao mentioned this pull request Mar 10, 2023
12 tasks
@oxalica
Copy link
Contributor

oxalica commented Mar 10, 2023

Could we merge all lock files into one, like nodePackages do currently? It can save many duplicated dependencies, but making it hard to update individual packages (conflicts!).

Merging lock files also enables the possibility to keep only semver incompatible versions of the same package, though it is against the original purpose of the lock file.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-03-13-nixpkgs-architecture-team-meeting-32/26287/1

@FRidh
Copy link
Member

FRidh commented Mar 18, 2023

We should absolutely go for FOD's, however, this is a big size increase.

I think what we need is a store for "lock files". Essentially, we break apart these lock files in tiny chunks so we deduplicate parts. It would be interesting to see how much could actually be gained if done here, just to see if it is worth it at all.

@figsoda
Copy link
Member

figsoda commented Mar 19, 2023

@yu-re-ka is working on an alternative in #221716

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-03-20-nixpkgs-architecture-team-meeting-33/26547/1

@nagy
Copy link
Member

nagy commented Mar 26, 2023

Since this PR will cause a lot of Cargo.lock files to be put in the tree, it would be appropriate, to add a line to https://github.com/NixOS/nixpkgs/blob/master/.gitattributes like this:

**/Cargo.lock linguist-generated

This tells GitHub not to render those files in diffs to begin with.
EDIT:
It seems like GitHub already does recognize that filename and that change is not needed after all.

@alyssais alyssais added the 2.status: merge conflict This PR has merge conflicts with the target branch label Mar 27, 2023
@infinisil infinisil added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Apr 19, 2023
@Ma27 Ma27 mentioned this pull request Jun 14, 2023
12 tasks
@kira-bruneau kira-bruneau mentioned this pull request Aug 22, 2023
12 tasks
@nyabinary
Copy link
Contributor

Shouldn't this be closed as the alternative got merged? #221716

@linsui linsui mentioned this pull request Nov 12, 2023
13 tasks
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 19, 2024
@winterqt
Copy link
Member Author

winterqt commented Oct 3, 2024

Superseded by #221716 and #333702.

@winterqt winterqt closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: policy discussion 6.topic: rust 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.