-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Migrate all Rust packages to importCargoLock
#217084
Conversation
cc @NixOS/rust |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/migrate-all-rust-packages-to-importcargolock/25582/1 |
Another benefit of doing this is that it makes it easier for us to do things like upgrade a particular Rust dependency across the board in Nixpkgs if a vulnerability is found. (This would be a lot more feasible to do for Rust than for other language ecosystems because so much is checked at compile time with Rust, and generally Rust libraries are good at not making other breaking changes in security updates.) We could technically do this currently too, but it's too hard. It requires some fiddly scripting, and actually updating a dependency for a package is quite annoying — we'd have to either commit the lockfile anyway, or generate a patch and update the
I also want to highlight how important this one is. If you're on a plane/train, or more importantly if you're in a part of the world where internet is slow or expensive, this can make it infeasible to build Rust packages from Nixpkgs for no real reason — each individual crate is probably <1MB to download, but the Cargo FOD fetcher as currently implemented has to download the whole huge registry every time it's used. (It would possibly be possible for us to fix this, by having a separate derivation for the crate registry, that could be shared between invocations of the FOD fetcher, but it would be yet another layer on the Jenga tower of the FOD system, and it wouldn't be simple to implement — we'd need to update it extremely frequently so that it was likely to have whatever crates somebody might be using indexed at the time they wanted to use it.)
This isn't an isolated incident — it has happened repeatedly over the last few years. Every time it requires a lot of work to fix (we're lucky to even have advance notice this time), and there's nothing we can do for out-of-tree users. For other abuses of the FOD mechanism, we don't really have good alternatives (at least for now), but as Winter points out, Cargo is designed basically perfectly so that we don't need to. |
Note that I've also written CI tooling for checking in tree lock files which could easily be adapted for nixpkgs. The benefit is that it'll no longer require downloading the amount of sources it currently does, but work just on the local checkout of nixpkgs. Maybe we could then make this into a proper reporting tool of sorts (maybe even in CI?). |
String::from_utf8( | ||
Command::new("nix-build") | ||
.arg("-A") | ||
.arg(format!("{attr}.src")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this doesn't respect cargoPatches
and can break builds due to mismatch in Cargo.lock
(example), but this can probably be fixed manually and not be included in the script
$ rg 'cargoPatches =' | wc -l
50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, will make it skip those and rerun. I naively assumed/forgot that my check of "is there a vendored Cargo.lock already" would(n't) be sufficient.
I am not going to repeat the benefits of this change, overall this is a 👍 for me. I will just link a few related PRs
Unrelated to this proposal, but we should probably backport this change to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be a user-visible change in policy? If so, we should document it in the manual.
@@ -0,0 +1,4 @@ | |||
#!/usr/bin/env nix-shell | |||
#!nix-shell -I nixpkgs=. -i bash -p "import ./maintainers/scripts/convert-to-import-cargo-lock" nix-prefetch-git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could make sense to add a link to this PR here or in the Rust source, IMO
Yes, and I noted this in the proposal:
(Admittedly, it was buried at the end, so I don't blame you for missing it.) |
if let Some(hash) = hashes.get(original_url) { | ||
git_dependencies.push(( | ||
format!("{}-{}", package.name, package.version), | ||
hash.clone(), | ||
)); | ||
|
||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to deduplicate at the local level as well, as importCargoLock
instructs users to use the first dependency per shared repo -- we currently add duplicate outputHashes
entries for no good reason.
cc @ryantm as this will also break automatic updates in ryantm-r... |
Will the upgrade break our current checksums or will it render the old vendor mechanism useless? |
Was gonna poke him if the overall opinion was largely positive.
The Cargo change? It'll only invalidate some hashes, it won't break the mechanism entirely. |
Speaking of, we should determine some sort of size threshold that warrants not vendoring the lock file... 1 MiB? 5 MiB? |
Are there any packages we know of for which this might be an issue? (Also compressed size is probably what matters — not sure how far we want to do down ease-of-check vs accuracy.) |
Even better, let's do it before them 😉
I'm unsure off the top of my head, though I'm sure we can run the numbers on the PoC branch.
What do you mean by the latter, considering everything is losslessly compressed? |
When considering whether a lockfile is too big, the most important thing to configure is how much size it permanently adds to the repo, which would be the size after being gzipped and maybe included in a packfile, rather than the size it temporarily adds to a checkout. (Hopefully Cargo.lock files will compress extremely well when packed together, as there'll be a lot of duplication between them.) |
Still git will considerable becomes significant slower on every action if you add large text files to it, so we should still think about a limit even if its just for yarn2nix/npm2nix lock files where large dependency trees are common. |
I skimmed through the recording, seems like the size increase is the biggest drawback that was brought up. A few things I want to explore/clear up: The 32.6MiB shouldn't be as big of impact on the tarball size, as it was mentioned since this number is uncompressed, and there are quite a lot of repetition like Another thing we can look into is minifying the lock files, in addition to the aforementioned [[package]]
name = "aes"
version = "0.7.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e8b47f52ea9bae42228d07ec09eb676433d7c4ed1ebdf0f1d1c29ed446f1ab8"
dependencies = [
"cfg-if",
"cipher",
"cpufeatures",
"opaque-debug",
] can ideally be minified to something like this [[p]]
n="aes"
v="0.7.5"
h="9e8b47f52ea9bae42228d07ec09eb676433d7c4ed1ebdf0f1d1c29ed446f1ab8" which is a >60% decrease in size Another thing to consider is that the vendor tarballs |
I'm very much in favor of this PR.
Indeed. The git history (which is what really matters) grows by less than 19MB.$ git clone -b import-cargo-lock-migration-poc https://github.com/winterqt/nixpkgs $ git -C nixpkgs reset --hard a3c24f953573af4e6964f2f0099a34a9dc4ba236 $ mkdir clean-clone $ cd clean-clone $ git -C ../nixpkgs rev-parse HEAD a3c24f953573af4e6964f2f0099a34a9dc4ba236 $ git clone --bare --single-branch --no-local ../nixpkgs $ cd nixpkgs.git/ $ git gc $ du -sk . 2715740 . $ git -C ../nixpkgs rev-parse import-cargo-lock-migration-poc b2045f8adb5b836eaee7740d310e6c15a00eda24 $ git fetch ../nixpkgs import-cargo-lock-migration-poc $ git gc $ du -sk . 2734548 . $ dc 10k 2734548 2715740-f 18808
Not having IFD-free eval-time access to the contents of Nixpkgs is supposed to contain copies of the dependency graphs of its packages, in eval-time-manipulatable form. Usually that means hand-transcription into Nix source code. For |
This is a great improvement! |
Could we merge all lock files into one, like Merging lock files also enables the possibility to keep only semver incompatible versions of the same package, though it is against the original purpose of the lock file. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-03-13-nixpkgs-architecture-team-meeting-32/26287/1 |
We should absolutely go for FOD's, however, this is a big size increase. I think what we need is a store for "lock files". Essentially, we break apart these lock files in tiny chunks so we deduplicate parts. It would be interesting to see how much could actually be gained if done here, just to see if it is worth it at all. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-03-20-nixpkgs-architecture-team-meeting-33/26547/1 |
|
Shouldn't this be closed as the alternative got merged? #221716 |
This PR proposes converting all Rust packages in Nixpkgs to using
importCargoLock
instead offetchCargoTarball
whenever possible, which parses the lock file to generate independent FODs, per dependency, using the hashes from it.Why now?
Alternatively: "why isn't this a proper RFC?"
On March 9th, 2023, Rust 1.68.0 will be released. With it, Cargo will introduce a change that will break our FODs for packages that use Git dependencies.
We just don't have time to go through the RFC process, unless we want to fix all affected hashes.
@alyssais thought that this oppertunity was perfect to just vendor the lockfiles, and after some discussion, I agree.
Benefits
Drawbacks
What about other languages?
Alternatively: "you maintain a FOD-based builder, are you going to push vendoring their lockfiles?"
Rust's lockfile format is uniquely suitable to this type of vendoring because they turn out to be pretty small (unlike npm or Yarn's), and we can use the hashes from them directly (unlike Go's).
All in all, this PR merely exists as a stopgap until computed derivations are in a state where we can use them. Once they are, they will obviously be superior to anything vendoring can do.
Assuming the consensus is that we want to move forward with this, I will update this PR to include my PoC branch, which is where you can take a look at what this will end up looking like. This branch converts 879 packages, which were all done with the script I've added in this PR. When the actual conversion happens, I'll go through and do the rest manually. I also need to update documentation.
I appreciate any and all thoughts on this, thanks!