-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Cargo.toml when packaging crates #4030
Conversation
r? @matklad |
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
I'll need some time to digest it, but one thought I have right now is that perhaps it would be prudent to add original manifest to archive as well? I think having unmodified manifests could help us if something goes terribly wrong. |
That's probably not a bad idea, yeah. Something like |
Any name will do as long as it is mentioned in the comment of the rewritten file. However, mechanically rewriting a file which was created manually by the user does not look super great to me. An alternative would be to store modifications in a separate file, |
Unfortunately the key feature here is that the Cargo.toml is overridden. in the sense that a "normal" |
Am I correct that this rewriting should not affect anything besides the literal "cargo build downloaded crate" use case, because we are rewriting only the summary of the manifest, and we ignore summary from On the one hand, this is good, because it means that rewriting Cargo.toml is unlikely to break anything significant. On the other hand, this means that this feature will be rarely tested in the wild, and some bugs may remain uncovered.
I think because publish is a rare operation, we can make it much noisier than usual. Like, we can copy all the files we are going to publish to a temp directory, together with rewritten toml, show the listing of files we are going to publish (were there some complaints that cargo accidently published stuff that wasn't supposed to be published and vice verse?), print some information about rewriting path deps to usual deps, and ask a conformation? |
One more alternative is to add a single very specific boolean option to Cargo.toml (like, appending |
Ah, I missed the part that now we verify unmodified package! That should help a lot. Overall, I think we should definitely do this! The only thing that makes me nervous is implementaion: there's a lot of code there, and most of the code is identity function. Perhaps we can operate on untyped toml |
d6aba3d
to
99e39f0
Compare
Not 100%. We do indeed ignore the
I definitely agree with this, we've got a lot of leeway here. I'm wary to add interactivity which might break scripts, but I think a saving grace of ours is the sanity check of "cargo build" just before you publish. The modifications here make this almost a Do you think that's enough? Or do you think we should add something else? I'm wary to say "please look at this file and make sure it's right" because it's a pretty ugly file (not a pretty toml, a fully explicit toml) so it will look alien to most users.
Hm that's an interesting idea yeah. Something that basically makes it as uninvasive as possible. I'm still a little wary of doing this though as it's a new public-facing feature unlike this implementation which is entirely an internal detail (in theory). One thought that's also been rolling around in the back of my head is that there may not always be an obvious and/or canonical desugaring for a manifest. Right now we drop Basically I'm thinking that with multi-registry support Cargo may not know via simply a boolean flag where a
My hope here was to use the literall 100% same exact structures for serialization and deserialization. That at least provides the guarantee that we won't forget to implement any key we add support for or something like that. It definitely means it's a bit trickier, however, because a lot of the deserialization business is done mostly for ergonomics (lots of options, multi variants, etc). The downside is definitely that it's a relatively large wad of code, and some of the translation is non-obvious. Primarily the inference from a list of |
Oh I've also pushed an update which writes in |
Yeah. Why |
Oh I don't mind either way, we need to go from a |
99e39f0
to
780ee2a
Compare
Could you run into cases where the |
Yes that's true I think. Basically if you use an older Cargo to publish a newer crate, it'd drop fields it didn't know about. The alternative, though, I see as sort of just as bad. If we just trawl through the TOML representation deleting, for example, Does that make sense? I think I'd slightly prefer the strategy with compiler errors and basically recommend you publish w/ recent versions of Cargo, which typically is what you're doing anyway as you likely just ran |
7eb5aaa
to
7989b15
Compare
Can we reparse toml file into TomlManifest? It seems to me that What do you think about adding a test that rewrites a Cargo.toml and checks the resulting file literally? I understand that this should already be integration-tested by our packaging tests, but having a before-after style test would probably help with checking some edge cases, should they become apparent. |
@matklad hm I'm not 100% sure what you're asking for actually, mind elaborating? I'm not sure what you're thinking is transitioning to what? I wouldn't mind though, yeah, adding a byte-for-byte check somewhere of the manifests that are generated. |
Sure! We have an original Looks like its I think we can trivialize
Hm, now that I've spelled this out I see one more way to get
Anyways, this is only an implementation detail and it does not matter much if tested thoroughly :) If you think the current implementation is sufficiently best, let's add a byte-for-byte check and merge this. |
Ok I think I like that strategy more. It still doesn't handle @cuviper's concerns about how it drops unknown keys from the manifest (e.g. ones added in future Cargos) but it makes me more comfortable by preserving the original keys and such. |
7989b15
to
19c478b
Compare
@matklad ok I've updated with said strategy |
src/cargo/util/toml.rs
Outdated
bench: self.bench.clone(), | ||
dependencies: map_deps(&self.dependencies), | ||
dev_dependencies: map_deps(&self.dev_dependencies), | ||
dev_dependencies2: map_deps(&self.dev_dependencies2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we may normalize dev_dependencies2
into dev_dependencies
at this step?
Even if we don't plan to drop support for the second variant, having only one form may be useful if, for example, someone greps ~/.cargo
or otherwise interacts with crates without Cargo itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure yeah, makes sense to me, pushed up a variant of that!
19c478b
to
0a96484
Compare
tests/package.rs
Outdated
exclude = ["*.txt"] | ||
description = "foo" | ||
license = "MIT" | ||
"#)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a little bit less trivial and add [workspace]
key and a path dependency, just to make sure that identity function won't pass this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh good point, looks like I forgot to delete package.workspace
as well, handled now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, does that mean that we don't test publising workspaced crates? The old code should have failed on publishing, shouldn't it?
This commit is an implementation of rewriting TOML manifests when we publish them to the registry. The rationale for doing this is to provide a guarantee that downloaded tarballs from crates.io can be built with `cargo build` (literally). This in turn eases a number of other possible consumers of crates from crates.io * Vendored sources can now be more easily modified/checked as cargo build should work and they're standalone crates that suffice for `path` dependencies * Tools like cargobomb/crater no longer need to edit the manifest and can instead perform regression testing on the literal tarballs they download * Other systems such as packaging Rust code may be able to take advantage of this, but this is a less clear benefit. Overall I'm hesitatnt about this, unfortunately. This is a silent translation happening on *publish*, a rare operation, that's difficult to inspect before it flies up to crates.io. I wrote a script to run this transformation over all crates.io crates and found a surprisingly large number of discrepancies. The transformation basically just downloaded all crates at all versions, regenerated the manifest, and then tested if the two manifests were (in memory) the same. Unfortunately historical Cargo had a critical bug which I think made this exercise not too useful. Cargo used to *not* recreate tarballs if one already existed, which I believe led to situations such as: 1. `cargo publish` 2. Cargo generates an error about a dependency. This could be that there's a `version` not present in a `path` dependency, there could be a `git` dependency, etc. 3. Errors are fixed. 4. `cargo publish` 5. Publish is successful In step 4 above historical Cargo *would not recreate the tarball*. This means that the contents of the index (what was published) aren't guaranteed to match with the tarball's `Cargo.toml`. When building from crates.io this is ok as the index is the source of truth for dependency information, but it means that *any* transformation to rewrite Cargo.toml is impossible to verify against all crates on crates.io (due to historical bugs like these). I strove to read as many errors as possible regardless, attempting to suss out bugs in the implementation here. To further guard against surprises I've updated the verification step of packaging to work "normally" in these sense that it's not rewriting dependencies itself or changing summaries. I'm hoping that this serves as a good last-ditch effort that what we're about to publish will indeed build as expected when uploaded to crates.io Overall I'm probably 70% confident in this change. I think it's necessary to make progress, but I think there are going to be very painful bugs that arise from this feature. I'm open to ideas to help weed out these bugs ahead of time! I've done what I can but I fear it may not be entirely enough. Closes rust-lang#4027
0a96484
to
57db8bd
Compare
@alexcrichton looks great now! @bors r+ |
📌 Commit 57db8bd has been approved by |
Rewrite Cargo.toml when packaging crates This commit is an implementation of rewriting TOML manifests when we publish them to the registry. The rationale for doing this is to provide a guarantee that downloaded tarballs from crates.io can be built with `cargo build` (literally). This in turn eases a number of other possible consumers of crates from crates.io * Vendored sources can now be more easily modified/checked as cargo build should work and they're standalone crates that suffice for `path` dependencies * Tools like cargobomb/crater no longer need to edit the manifest and can instead perform regression testing on the literal tarballs they download * Other systems such as packaging Rust code may be able to take advantage of this, but this is a less clear benefit. Overall I'm hesitatnt about this, unfortunately. This is a silent translation happening on *publish*, a rare operation, that's difficult to inspect before it flies up to crates.io. I wrote a script to run this transformation over all crates.io crates and found a surprisingly large number of discrepancies. The transformation basically just downloaded all crates at all versions, regenerated the manifest, and then tested if the two manifests were (in memory) the same. Unfortunately historical Cargo had a critical bug which I think made this exercise not too useful. Cargo used to *not* recreate tarballs if one already existed, which I believe led to situations such as: 1. `cargo publish` 2. Cargo generates an error about a dependency. This could be that there's a `version` not present in a `path` dependency, there could be a `git` dependency, etc. 3. Errors are fixed. 4. `cargo publish` 5. Publish is successful In step 4 above historical Cargo *would not recreate the tarball*. This means that the contents of the index (what was published) aren't guaranteed to match with the tarball's `Cargo.toml`. When building from crates.io this is ok as the index is the source of truth for dependency information, but it means that *any* transformation to rewrite Cargo.toml is impossible to verify against all crates on crates.io (due to historical bugs like these). I strove to read as many errors as possible regardless, attempting to suss out bugs in the implementation here. To further guard against surprises I've updated the verification step of packaging to work "normally" in these sense that it's not rewriting dependencies itself or changing summaries. I'm hoping that this serves as a good last-ditch effort that what we're about to publish will indeed build as expected when uploaded to crates.io Overall I'm probably 70% confident in this change. I think it's necessary to make progress, but I think there are going to be very painful bugs that arise from this feature. I'm open to ideas to help weed out these bugs ahead of time! I've done what I can but I fear it may not be entirely enough. Closes #4027
☀️ Test successful - status-appveyor, status-travis |
This commit is an implementation of rewriting TOML manifests when we publish
them to the registry. The rationale for doing this is to provide a guarantee
that downloaded tarballs from crates.io can be built with
cargo build
(literally). This in turn eases a number of other possible consumers of crates
from crates.io
work and they're standalone crates that suffice for
path
dependenciesinstead perform regression testing on the literal tarballs they download
this, but this is a less clear benefit.
Overall I'm hesitatnt about this, unfortunately. This is a silent translation
happening on publish, a rare operation, that's difficult to inspect before it
flies up to crates.io. I wrote a script to run this transformation over all
crates.io crates and found a surprisingly large number of discrepancies. The
transformation basically just downloaded all crates at all versions,
regenerated the manifest, and then tested if the two manifests were (in memory)
the same.
Unfortunately historical Cargo had a critical bug which I think made this
exercise not too useful. Cargo used to not recreate tarballs if one already
existed, which I believe led to situations such as:
cargo publish
version
not present in apath
dependency, there could be agit
dependency, etc.
cargo publish
In step 4 above historical Cargo would not recreate the tarball. This means
that the contents of the index (what was published) aren't guaranteed to match
with the tarball's
Cargo.toml
. When building from crates.io this is ok as theindex is the source of truth for dependency information, but it means that any
transformation to rewrite Cargo.toml is impossible to verify against all crates
on crates.io (due to historical bugs like these).
I strove to read as many errors as possible regardless, attempting to suss out
bugs in the implementation here. To further guard against surprises I've updated
the verification step of packaging to work "normally" in these sense that it's
not rewriting dependencies itself or changing summaries. I'm hoping that this
serves as a good last-ditch effort that what we're about to publish will indeed
build as expected when uploaded to crates.io
Overall I'm probably 70% confident in this change. I think it's necessary to
make progress, but I think there are going to be very painful bugs that arise
from this feature. I'm open to ideas to help weed out these bugs ahead of time!
I've done what I can but I fear it may not be entirely enough.
Closes #4027