Rewrite Cargo.toml when packaging crates #4030

alexcrichton · 2017-05-11T16:25:51Z

This commit is an implementation of rewriting TOML manifests when we publish
them to the registry. The rationale for doing this is to provide a guarantee
that downloaded tarballs from crates.io can be built with cargo build
(literally). This in turn eases a number of other possible consumers of crates
from crates.io

Vendored sources can now be more easily modified/checked as cargo build should
work and they're standalone crates that suffice for path dependencies
Tools like cargobomb/crater no longer need to edit the manifest and can
instead perform regression testing on the literal tarballs they download
Other systems such as packaging Rust code may be able to take advantage of
this, but this is a less clear benefit.

Overall I'm hesitatnt about this, unfortunately. This is a silent translation
happening on publish, a rare operation, that's difficult to inspect before it
flies up to crates.io. I wrote a script to run this transformation over all
crates.io crates and found a surprisingly large number of discrepancies. The
transformation basically just downloaded all crates at all versions,
regenerated the manifest, and then tested if the two manifests were (in memory)
the same.

Unfortunately historical Cargo had a critical bug which I think made this
exercise not too useful. Cargo used to not recreate tarballs if one already
existed, which I believe led to situations such as:

cargo publish
Cargo generates an error about a dependency. This could be that there's a
version not present in a path dependency, there could be a git
dependency, etc.
Errors are fixed.
cargo publish
Publish is successful

In step 4 above historical Cargo would not recreate the tarball. This means
that the contents of the index (what was published) aren't guaranteed to match
with the tarball's Cargo.toml. When building from crates.io this is ok as the
index is the source of truth for dependency information, but it means that any
transformation to rewrite Cargo.toml is impossible to verify against all crates
on crates.io (due to historical bugs like these).

I strove to read as many errors as possible regardless, attempting to suss out
bugs in the implementation here. To further guard against surprises I've updated
the verification step of packaging to work "normally" in these sense that it's
not rewriting dependencies itself or changing summaries. I'm hoping that this
serves as a good last-ditch effort that what we're about to publish will indeed
build as expected when uploaded to crates.io

Overall I'm probably 70% confident in this change. I think it's necessary to
make progress, but I think there are going to be very painful bugs that arise
from this feature. I'm open to ideas to help weed out these bugs ahead of time!
I've done what I can but I fear it may not be entirely enough.

Closes #4027

alexcrichton · 2017-05-11T16:26:01Z

r? @matklad

rust-highfive · 2017-05-11T16:26:02Z

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

matklad · 2017-05-11T16:39:17Z

I'll need some time to digest it, but one thought I have right now is that perhaps it would be prudent to add original manifest to archive as well? I think having unmodified manifests could help us if something goes terribly wrong.

alexcrichton · 2017-05-11T19:01:28Z

That's probably not a bad idea, yeah. Something like Cargo.toml.orig perhaps?

matklad · 2017-05-11T19:28:53Z

That's probably not a bad idea, yeah. Something like Cargo.toml.orig perhaps?

Any name will do as long as it is mentioned in the comment of the rewritten file.

However, mechanically rewriting a file which was created manually by the user does not look super great to me. An alternative would be to store modifications in a separate file, Cargo.patch, analogous to Cargo.lock, and then merge Cargo.toml and Cargo.patch during workspace loading. This second solution seems more clean to me, but I don't like it anyway because it's more complicated and introduces additional logic :)

alexcrichton · 2017-05-11T20:17:20Z

Unfortunately the key feature here is that the Cargo.toml is overridden. in the sense that a "normal" cargo build will work in the crate, and the crate otherwise operates as a normal path dependency when the source is downloaded. Now that doesn't meant that we couldn't support a "patch" format of some kind (maybe not literally a patch file), but it means that it'd become a first-class feature of Cargo that everyone could use everywhere, which I'm not sure we want to do just yet...

matklad · 2017-05-12T07:58:39Z

Am I correct that this rewriting should not affect anything besides the literal "cargo build downloaded crate" use case, because we are rewriting only the summary of the manifest, and we ignore summary from Cargo.toml and use the one from the index anyway?

On the one hand, this is good, because it means that rewriting Cargo.toml is unlikely to break anything significant. On the other hand, this means that this feature will be rarely tested in the wild, and some bugs may remain uncovered.

This is a silent translation happening on publish, a rare operation, that's difficult to inspect before it
flies up to crates.io

I think because publish is a rare operation, we can make it much noisier than usual. Like, we can copy all the files we are going to publish to a temp directory, together with rewritten toml, show the listing of files we are going to publish (were there some complaints that cargo accidently published stuff that wasn't supposed to be published and vice verse?), print some information about rewriting path deps to usual deps, and ask a conformation?

matklad · 2017-05-12T08:12:07Z

Now that doesn't meant that we couldn't support a "patch" format of some kind (maybe not literally a patch file), but it means that it'd become a first-class feature of Cargo that everyone could use everywhere, which I'm not sure we want to do just yet...

One more alternative is to add a single very specific boolean option to Cargo.toml (like, appending published = true to the end of the file), which basically says: "ignore path deps when reading this file".

matklad · 2017-05-12T08:33:11Z

On the other hand, this means that this feature will be rarely tested in the wild, and some bugs may remain uncovered.

Ah, I missed the part that now we verify unmodified package! That should help a lot.

Overall, I think we should definitely do this! The only thing that makes me nervous is implementaion: there's a lot of code there, and most of the code is identity function. Perhaps we can operate on untyped toml Table directly and modify only the path we are interested in? That way, we wouldn't need to change TOML frobbing code when adding new keys to the manifest.

alexcrichton · 2017-05-12T16:44:07Z

Am I correct that this rewriting should not affect anything besides the literal "cargo build downloaded crate" use case

Not 100%. We do indeed ignore the Summary portion of Cargo.toml, but we still parse Cargo.toml from the registry downloads (for normal crates.io builds) to learn about targets. Stuff like "is this a proc macro?" or "where is the library entry point file?" In that sense the targets are critical to preserve, the dependencies less so for normal operation.

I think because publish is a rare operation, we can make it much noisier than usual

I definitely agree with this, we've got a lot of leeway here. I'm wary to add interactivity which might break scripts, but I think a saving grace of ours is the sanity check of "cargo build" just before you publish. The modifications here make this almost a cargo build modulo how the workspace is handled, but it should in theory be a good check that the crate at least builds.

Do you think that's enough? Or do you think we should add something else? I'm wary to say "please look at this file and make sure it's right" because it's a pretty ugly file (not a pretty toml, a fully explicit toml) so it will look alien to most users.

One more alternative is to add a single very specific boolean option to Cargo.toml (like, appending published = true to the end of the file), which basically says: "ignore path deps when reading this file".

Hm that's an interesting idea yeah. Something that basically makes it as uninvasive as possible. I'm still a little wary of doing this though as it's a new public-facing feature unlike this implementation which is entirely an internal detail (in theory).

One thought that's also been rolling around in the back of my head is that there may not always be an obvious and/or canonical desugaring for a manifest. Right now we drop [workspace] and path dependencies, but the path dependencies are specifically rewritten to crates.io and more generally the registry being published to.

Basically I'm thinking that with multi-registry support Cargo may not know via simply a boolean flag where a path dependency is supposed to point, it may or may not be crates.io.

only thing that makes me nervous is implementaion: there's a lot of code there, and most of the code is identity function. Perhaps we can operate on untyped toml Table directly and modify only the path we are interested in?

My hope here was to use the literall 100% same exact structures for serialization and deserialization. That at least provides the guarantee that we won't forget to implement any key we add support for or something like that. It definitely means it's a bit trickier, however, because a lot of the deserialization business is done mostly for ergonomics (lots of options, multi variants, etc).

The downside is definitely that it's a relatively large wad of code, and some of the translation is non-obvious. Primarily the inference from a list of Target structures back to the [lib] etc sections is pretty scary...

alexcrichton · 2017-05-12T16:44:25Z

Oh I've also pushed an update which writes in Cargo.toml.orig with the exact contents of the original manifest.

matklad · 2017-05-15T09:17:44Z

Primarily the inference from a list of Target structures back to the [lib] etc sections is pretty scary...

Yeah. Why from_package_for_registry has a signature Package -> TomlManifest. Can we make it TomlManifest -> TomlManifest?

alexcrichton · 2017-05-15T17:10:46Z

Oh I don't mind either way, we need to go from a Package to TomlManifest somewhere, and there's not really a lot of usage in having an ephemeral TomlManifest, so I figured we'd short-circuit.

cuviper · 2017-05-15T17:20:04Z

Could you run into cases where the Package is less complete than the raw toml? e.g. missing a field for some forward-compatible new feature, which probably shouldn't just be dropped when an older cargo does the rewriting.

alexcrichton · 2017-05-15T19:36:28Z

Yes that's true I think. Basically if you use an older Cargo to publish a newer crate, it'd drop fields it didn't know about.

The alternative, though, I see as sort of just as bad. If we just trawl through the TOML representation deleting, for example, path keys and workspace keys, then we (a) may delete too much or (b) may forget to delete something crucial. For example if we add a new [foo-dependencies] section it seems very easy to forget to delete path keys from that section, whereas today it'd be a compiler error.

Does that make sense? I think I'd slightly prefer the strategy with compiler errors and basically recommend you publish w/ recent versions of Cargo, which typically is what you're doing anyway as you likely just ran cargo test before or some time before publication.

matklad · 2017-05-15T20:39:40Z

Oh I don't mind either way, we need to go from a Package to TomlManifest somewhere,

Can we reparse toml file into TomlManifest? It seems to me that Package -> Toml is actually the most laborious step here, and not the actual rewrite of stuff. And because TomlManifest is typed, you'd still get compile time errors.

What do you think about adding a test that rewrites a Cargo.toml and checks the resulting file literally? I understand that this should already be integration-tested by our packaging tests, but having a before-after style test would probably help with checking some edge cases, should they become apparent.

alexcrichton · 2017-05-15T23:41:45Z

@matklad hm I'm not 100% sure what you're asking for actually, mind elaborating? I'm not sure what you're thinking is transitioning to what?

I wouldn't mind though, yeah, adding a byte-for-byte check somewhere of the manifests that are generated.

matklad · 2017-05-16T08:44:58Z

@matklad hm I'm not 100% sure what you're asking for actually, mind elaborating?

Sure!

We have an original Package and want to get a rewritten TomlManifest. This can be represented as a composition of two functions: to_toml_manifest: Package -> TomlManifest and rewrite: TomlManifest -> TomlManifest. In the current implementation in from_package_for_registry we do both steps simultaneously.

Looks like its to_toml_manifest part that is the most difficult! And in some sense it seems not really necessary: we've got a Package from TomlManifest originally, so why do we have to write the conversion in the opposite direction, if we can reuse the exact original manifest?

I think we can trivialize to_toml_manifest, if we don't try to recreate TomlManifest from Package literally, as we do now, but instead use the same code we already have for creating TomlManifests, namely, parsing a Cargo.toml file. In pseudorust:

fn to_registry_toml(pkg: Pacakge) -> String {
    let toml_text = fs::File(pkg.manifest_path).read_to_string();
    let toml_value: toml::Value = parse(toml_text);
    let toml: TomlManifest = serde::deserialize(toml_value); // here we are getting `TomlManifest` for free
    rewrite(toml).to_string()
}

Hm, now that I've spelled this out I see one more way to get TomlManifest for free: we can add a original_manifest field to the Package:

struct Package {
  original_manifest: TomlManifest
}

Anyways, this is only an implementation detail and it does not matter much if tested thoroughly :) If you think the current implementation is sufficiently best, let's add a byte-for-byte check and merge this.

alexcrichton · 2017-05-17T14:26:37Z

Ok I think I like that strategy more. It still doesn't handle @cuviper's concerns about how it drops unknown keys from the manifest (e.g. ones added in future Cargos) but it makes me more comfortable by preserving the original keys and such.

alexcrichton · 2017-05-17T14:48:44Z

@matklad ok I've updated with said strategy

matklad · 2017-05-17T15:49:40Z

src/cargo/util/toml.rs

+            bench: self.bench.clone(),
+            dependencies: map_deps(&self.dependencies),
+            dev_dependencies: map_deps(&self.dev_dependencies),
+            dev_dependencies2: map_deps(&self.dev_dependencies2),


Perhaps we may normalize dev_dependencies2 into dev_dependencies at this step?

Even if we don't plan to drop support for the second variant, having only one form may be useful if, for example, someone greps ~/.cargo or otherwise interacts with crates without Cargo itself.

Sure yeah, makes sense to me, pushed up a variant of that!

matklad · 2017-05-17T16:07:40Z

tests/package.rs

+exclude = ["*.txt"]
+description = "foo"
+license = "MIT"
+"#));


Let's make this a little bit less trivial and add [workspace] key and a path dependency, just to make sure that identity function won't pass this test.

Heh good point, looks like I forgot to delete package.workspace as well, handled now.

Hm, does that mean that we don't test publising workspaced crates? The old code should have failed on publishing, shouldn't it?

This commit is an implementation of rewriting TOML manifests when we publish them to the registry. The rationale for doing this is to provide a guarantee that downloaded tarballs from crates.io can be built with `cargo build` (literally). This in turn eases a number of other possible consumers of crates from crates.io * Vendored sources can now be more easily modified/checked as cargo build should work and they're standalone crates that suffice for `path` dependencies * Tools like cargobomb/crater no longer need to edit the manifest and can instead perform regression testing on the literal tarballs they download * Other systems such as packaging Rust code may be able to take advantage of this, but this is a less clear benefit. Overall I'm hesitatnt about this, unfortunately. This is a silent translation happening on *publish*, a rare operation, that's difficult to inspect before it flies up to crates.io. I wrote a script to run this transformation over all crates.io crates and found a surprisingly large number of discrepancies. The transformation basically just downloaded all crates at all versions, regenerated the manifest, and then tested if the two manifests were (in memory) the same. Unfortunately historical Cargo had a critical bug which I think made this exercise not too useful. Cargo used to *not* recreate tarballs if one already existed, which I believe led to situations such as: 1. `cargo publish` 2. Cargo generates an error about a dependency. This could be that there's a `version` not present in a `path` dependency, there could be a `git` dependency, etc. 3. Errors are fixed. 4. `cargo publish` 5. Publish is successful In step 4 above historical Cargo *would not recreate the tarball*. This means that the contents of the index (what was published) aren't guaranteed to match with the tarball's `Cargo.toml`. When building from crates.io this is ok as the index is the source of truth for dependency information, but it means that *any* transformation to rewrite Cargo.toml is impossible to verify against all crates on crates.io (due to historical bugs like these). I strove to read as many errors as possible regardless, attempting to suss out bugs in the implementation here. To further guard against surprises I've updated the verification step of packaging to work "normally" in these sense that it's not rewriting dependencies itself or changing summaries. I'm hoping that this serves as a good last-ditch effort that what we're about to publish will indeed build as expected when uploaded to crates.io Overall I'm probably 70% confident in this change. I think it's necessary to make progress, but I think there are going to be very painful bugs that arise from this feature. I'm open to ideas to help weed out these bugs ahead of time! I've done what I can but I fear it may not be entirely enough. Closes rust-lang#4027

matklad · 2017-05-18T09:34:51Z

@alexcrichton looks great now!

@bors r+

bors · 2017-05-18T09:34:52Z

📌 Commit 57db8bd has been approved by matklad

bors · 2017-05-18T09:34:58Z

⌛ Testing commit 57db8bd with merge 3973598...

Rewrite Cargo.toml when packaging crates This commit is an implementation of rewriting TOML manifests when we publish them to the registry. The rationale for doing this is to provide a guarantee that downloaded tarballs from crates.io can be built with `cargo build` (literally). This in turn eases a number of other possible consumers of crates from crates.io * Vendored sources can now be more easily modified/checked as cargo build should work and they're standalone crates that suffice for `path` dependencies * Tools like cargobomb/crater no longer need to edit the manifest and can instead perform regression testing on the literal tarballs they download * Other systems such as packaging Rust code may be able to take advantage of this, but this is a less clear benefit. Overall I'm hesitatnt about this, unfortunately. This is a silent translation happening on *publish*, a rare operation, that's difficult to inspect before it flies up to crates.io. I wrote a script to run this transformation over all crates.io crates and found a surprisingly large number of discrepancies. The transformation basically just downloaded all crates at all versions, regenerated the manifest, and then tested if the two manifests were (in memory) the same. Unfortunately historical Cargo had a critical bug which I think made this exercise not too useful. Cargo used to *not* recreate tarballs if one already existed, which I believe led to situations such as: 1. `cargo publish` 2. Cargo generates an error about a dependency. This could be that there's a `version` not present in a `path` dependency, there could be a `git` dependency, etc. 3. Errors are fixed. 4. `cargo publish` 5. Publish is successful In step 4 above historical Cargo *would not recreate the tarball*. This means that the contents of the index (what was published) aren't guaranteed to match with the tarball's `Cargo.toml`. When building from crates.io this is ok as the index is the source of truth for dependency information, but it means that *any* transformation to rewrite Cargo.toml is impossible to verify against all crates on crates.io (due to historical bugs like these). I strove to read as many errors as possible regardless, attempting to suss out bugs in the implementation here. To further guard against surprises I've updated the verification step of packaging to work "normally" in these sense that it's not rewriting dependencies itself or changing summaries. I'm hoping that this serves as a good last-ditch effort that what we're about to publish will indeed build as expected when uploaded to crates.io Overall I'm probably 70% confident in this change. I think it's necessary to make progress, but I think there are going to be very painful bugs that arise from this feature. I'm open to ideas to help weed out these bugs ahead of time! I've done what I can but I fear it may not be entirely enough. Closes #4027

bors · 2017-05-18T11:17:16Z

☀️ Test successful - status-appveyor, status-travis
Approved by: matklad
Pushing 3973598 to master...

rust-highfive assigned brson May 11, 2017

rust-highfive assigned matklad and unassigned brson May 11, 2017

alexcrichton force-pushed the rewrite-cargo-toml branch from d6aba3d to 99e39f0 Compare May 12, 2017 16:35

alexcrichton mentioned this pull request May 12, 2017

Rewrite Cargo.toml on publication to crates.io #4027

Closed

alexcrichton force-pushed the rewrite-cargo-toml branch from 99e39f0 to 780ee2a Compare May 15, 2017 17:10

alexcrichton force-pushed the rewrite-cargo-toml branch 2 times, most recently from 7eb5aaa to 7989b15 Compare May 15, 2017 19:59

alexcrichton force-pushed the rewrite-cargo-toml branch from 7989b15 to 19c478b Compare May 17, 2017 14:48

matklad reviewed May 17, 2017

View reviewed changes

alexcrichton force-pushed the rewrite-cargo-toml branch from 19c478b to 0a96484 Compare May 17, 2017 15:53

matklad reviewed May 17, 2017

View reviewed changes

alexcrichton force-pushed the rewrite-cargo-toml branch from 0a96484 to 57db8bd Compare May 17, 2017 16:24

bors merged commit 57db8bd into rust-lang:master May 18, 2017

alexcrichton mentioned this pull request May 22, 2017

cargo publish fails with "ValueAfterTable" error #4081

Closed

alexcrichton deleted the rewrite-cargo-toml branch June 2, 2017 01:20

alexcrichton mentioned this pull request Aug 30, 2017

Running cargo build checks the dev-dependencies #1070

Closed

sfackler mentioned this pull request Jan 17, 2018

Alternative registry references are inconsistent between the name and index URL #4880

Closed

alexcrichton mentioned this pull request Jan 22, 2018

How to verify individual crates build (and ideally pass tests) for distro packaging #3732

Closed

sfackler mentioned this pull request Jan 31, 2018

cargo shouldn't check for dev-dependencies when running cargo build or cargo install #4988

Closed

cuviper mentioned this pull request Feb 5, 2020

published indexmap 1.3.1 has broken Cargo.toml indexmap-rs/indexmap#116

Closed

ehuss added this to the 1.19.0 milestone Feb 6, 2022

Nukesor mentioned this pull request Nov 21, 2022

[BUG] Release fails due to dependency conflict Nukesor/pueue#393

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite Cargo.toml when packaging crates #4030

Rewrite Cargo.toml when packaging crates #4030

alexcrichton commented May 11, 2017

alexcrichton commented May 11, 2017

rust-highfive commented May 11, 2017

matklad commented May 11, 2017

alexcrichton commented May 11, 2017

matklad commented May 11, 2017

alexcrichton commented May 11, 2017

matklad commented May 12, 2017

matklad commented May 12, 2017

matklad commented May 12, 2017 •

edited

Loading

alexcrichton commented May 12, 2017

alexcrichton commented May 12, 2017

matklad commented May 15, 2017

alexcrichton commented May 15, 2017

cuviper commented May 15, 2017

alexcrichton commented May 15, 2017

matklad commented May 15, 2017 •

edited

Loading

alexcrichton commented May 15, 2017

matklad commented May 16, 2017 •

edited

Loading

alexcrichton commented May 17, 2017

alexcrichton commented May 17, 2017

matklad May 17, 2017

alexcrichton May 17, 2017

matklad May 17, 2017

alexcrichton May 17, 2017

matklad May 17, 2017

matklad commented May 18, 2017

bors commented May 18, 2017

bors commented May 18, 2017

bors commented May 18, 2017

Rewrite Cargo.toml when packaging crates #4030

Rewrite Cargo.toml when packaging crates #4030

Conversation

alexcrichton commented May 11, 2017

alexcrichton commented May 11, 2017

rust-highfive commented May 11, 2017

matklad commented May 11, 2017

alexcrichton commented May 11, 2017

matklad commented May 11, 2017

alexcrichton commented May 11, 2017

matklad commented May 12, 2017

matklad commented May 12, 2017

matklad commented May 12, 2017 • edited Loading

alexcrichton commented May 12, 2017

alexcrichton commented May 12, 2017

matklad commented May 15, 2017

alexcrichton commented May 15, 2017

cuviper commented May 15, 2017

alexcrichton commented May 15, 2017

matklad commented May 15, 2017 • edited Loading

alexcrichton commented May 15, 2017

matklad commented May 16, 2017 • edited Loading

alexcrichton commented May 17, 2017

alexcrichton commented May 17, 2017

matklad May 17, 2017

Choose a reason for hiding this comment

alexcrichton May 17, 2017

Choose a reason for hiding this comment

matklad May 17, 2017

Choose a reason for hiding this comment

alexcrichton May 17, 2017

Choose a reason for hiding this comment

matklad May 17, 2017

Choose a reason for hiding this comment

matklad commented May 18, 2017

bors commented May 18, 2017

bors commented May 18, 2017

bors commented May 18, 2017

matklad commented May 12, 2017 •

edited

Loading

matklad commented May 15, 2017 •

edited

Loading

matklad commented May 16, 2017 •

edited

Loading