Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo tree doesn't handle transitive duplications properly #9599

Open
ehuss opened this issue Jun 18, 2021 · 7 comments
Open

cargo tree doesn't handle transitive duplications properly #9599

ehuss opened this issue Jun 18, 2021 · 7 comments
Labels
C-bug Category: bug Command-tree E-medium Experience: Medium S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@ehuss
Copy link
Contributor

ehuss commented Jun 18, 2021

Problem
There are some situations where cargo tree will mark a package as a duplicate (*) when it probably shouldn't (and shows the wrong features with --no-dedupe). This arises when a package is built twice with the same features, but with different dependencies.

A real-world example is when looking at the following:

[package]
name = "foo"
version = "0.1.0"
resolver = "2"

[dependencies]
diesel = { version = "=1.4.7", features = ["postgres"] }
diesel_migrations = "=1.4.0"

Running cargo tree -f '{p} {f}' results in:

~/Proj/rust/cargo/scratch/diesel-issue> cargo tree -f '{p} {f}'
diesel-issue v0.1.0 (/Users/eric/Proj/rust/cargo/scratch/diesel-issue)
├── diesel v1.4.7 32-column-tables,bitflags,default,postgres,pq-sys,with-deprecated
│   ├── bitflags v1.2.1 default
│   ├── byteorder v1.4.3 default,std
│   ├── diesel_derives v1.4.1 (proc-macro) default,postgres
│   │   ├── proc-macro2 v1.0.27 default,proc-macro
│   │   │   └── unicode-xid v0.2.2 default
│   │   ├── quote v1.0.9 default,proc-macro
│   │   │   └── proc-macro2 v1.0.27 default,proc-macro (*)
│   │   └── syn v1.0.73 clone-impls,default,derive,extra-traits,fold,full,parsing,printing,proc-macro,quote
│   │       ├── proc-macro2 v1.0.27 default,proc-macro (*)
│   │       ├── quote v1.0.9 default,proc-macro (*)
│   │       └── unicode-xid v0.2.2 default
│   └── pq-sys v0.4.6
└── diesel_migrations v1.4.0 default
    ├── migrations_internals v1.4.1 default
    │   └── diesel v1.4.7 32-column-tables,bitflags,default,postgres,pq-sys,with-deprecated (*)
    └── migrations_macros v1.4.2 (proc-macro) default
        ├── migrations_internals v1.4.1 default (*)    <-- PROBLEM IS HERE
        ├── proc-macro2 v1.0.27 default,proc-macro (*)
        ├── quote v1.0.9 default,proc-macro (*)
        └── syn v1.0.73 clone-impls,default,derive,extra-traits,fold,full,parsing,printing,proc-macro,quote (*)

The problem is the diesel-issue → diesel_migrations → migrations_macros → migrations_internals. It has a (*) to indicate that it is duplicated. However, migrations_internals is built twice, and this is deceiving because the second migrations_internals has a dependency on diesel with no features.

Running with --no-dedupe is even worse, because it shows the wrong features for diesel under migrations_internals.

Steps
The following is an example in Cargo's testsuite format:

#[cargo_test]
fn dedupe_transitive_features2() {
    Package::new("differs", "1.0.0")
        .feature("some-feat", &[])
        .publish();
    Package::new("shared", "1.0.0")
        .dep("differs", "1.0")
        .publish();

    let p = project()
        .file(
            "Cargo.toml",
            r#"
                [package]
                name = "foo"
                version = "0.1.0"
                resolver = "2"

                [dependencies]
                shared = "1.0"
                differs = {version="1.0", features=["some-feat"]}
                bar = {path="bar"}

                [build-dependencies]
            "#,
        )
        .file("src/lib.rs", "")
        .file(
            "bar/Cargo.toml",
            r#"
                [package]
                name = "bar"
                version = "0.1.0"

                [build-dependencies]
                shared = "1.0"
            "#,
        )
        .file("bar/src/lib.rs", "")
        .build();

    // ISSUE HERE: The last `shared` shouldn't be `(*)`
    p.cargo("tree -f")
        .arg("{p} [{f}]")
        .with_stdout(
            "\
foo v0.1.0 [..]
├── bar v0.1.0 [..]
│   [build-dependencies]
│   └── shared v1.0.0 []
│       └── differs v1.0.0 []
├── differs v1.0.0 [some-feat]
└── shared v1.0.0 [] (*)
",
        )
        .run();
}

    // ISSUE HERE: The last `differs` shows with no features!
    p.cargo("tree --no-dedupe -f")
        .arg("{p} [{f}]")
        .with_stdout(
            "\
foo v0.1.0 [..]
├── bar v0.1.0 [..]
│   [build-dependencies]
│   └── shared v1.0.0 []
│       └── differs v1.0.0 []
├── differs v1.0.0 [some-feat]
└── shared v1.0.0 []
    └── differs v1.0.0 [some-feat]
",
        )
        .run();

Possible Solution(s)
The issue is that cargo tree doesn't have the same logic that was added in #8701 to accommodate dependencies that are the same, but link to different dependencies.

I have toyed with the idea of changing cargo tree to use the UnitGraph computed for a normal build instead of trying to recreate how some of these things are computed. There are some complexities and downsides to that approach, but I continue to feel that trying to replicate this logic in multiple places is not a good idea.

Notes

Output of cargo version: cargo 1.54.0-nightly (4445667 2021-06-12)

@ehuss ehuss added C-bug Category: bug Command-tree labels Jun 18, 2021
@weihanglo
Copy link
Member

I also feel not right to replicate the logic. Love to know the downsides about reuse UnitGraph. Could you explain more?

@ehuss
Copy link
Contributor Author

ehuss commented Jun 28, 2021

The main problem is that UnitGraph is based on compilation units, whereas cargo tree wants the dependencies between packages. There are dependencies in UnitGraph that need to be filtered out.

For example, all the intra-package dependencies would need to be removed. Also, UnitGraph has multiple roots for a package (for example, if the package has multiple bins). Another example is build scripts, where the "run" unit depends on all transitive build scripts being run. Those are not real dependencies to be displayed. And conversely there is information missing from UnitGraph (like which dependencies are build-dependencies or dev-dependencies). That information is lost since the Resolve is discarded after the graph is computed.

Another problem is that the Graph built for cargo tree has a fundamentally different structure, and changes based on command-line options. All that code for build the graph will probably need to stay to handle all of that.

A concern is that attempting to share the code will end up needing more code to handle the impedance mismatch, and will be harder to maintain.

So, I think this is a fairly hard problem, mostly trying to balance the maintainability of the code. I don't really know what the right solution is.

@alsuren
Copy link

alsuren commented Nov 12, 2022

Hello. I would like to pick this task up. I will timebox 1 week for this, before I start my new job. If I haven't made any progress by then, I will park it and let someone else have a go.

(for context: I am currently using a copy-pasta of cargo::ops::tree::graph::Graph in my cargo-quickbuild prototype, and its proc-macro handling is causing me to calculate the wrong sets of features for some transitive dependencies. This causes fingerprint mismatches, so cargo build then recompiles large swathes of the crate graph and ruins all my fun)

What should I be aiming for here? One possible approach might be:

  • 1) a function which converts from a UnitGraph to a cargo::ops::tree::graph::Graph
  • 2) a function which converts back again for use in tests (I realise that some information is lost in each conversion process, so some extra context is needed in each direction).
  • 3) a fuzzer/proptest that drives the above functions in a round-trip, then checks for equality.
  • 4) replace cargo::ops::tree::build() with the one that goes via UnitGraph, and make sure its tests still pass.

This is quite an invasive approach, and probably isn't feasible in 1 week. I know I could just try to fix my immediate problem as a one-off, but it feels like it will be a game of whack-a-mole (see also #10651). Having a fuzzer would let me feel much more confident in the correctness of any fix I write).

I wonder if we could write the UnitGraph -> Graph conversion function from (1) (but not its inverse), and then write a fuzzer that sets up a random cargo workspace, and then checks that cargo::ops::tree::build() (with a very specific set of command-line options) matches the output from the UnitGraph -> Graph conversion function.

I haven't quite put my finger on what a random workspace generator would look like, or whether it would be able to cover all of the cases that we care about.

Does anyone have any thoughts, or should I just have a crack at it and see how far I get?

@weihanglo
Copy link
Member

Looks like a big plan! Previously I was asking using unit graph instead, but the reply from ehuss made me wonder it may be really challenging.

I would recommend experimenting on what you list in the first step, and then assert against the current output first. Fuzzer/protest can be added after we're all happy with the change.

I cannot guarantee things will get merged, though it is still great to explore any possibility to reduce code complexity and discrepancy between cargo-tree and other build commands :)

@alsuren
Copy link

alsuren commented Nov 15, 2022

Thanks for your feedback. I will be working in the open on #11379, if you want to follow along (it's not very interesting at the moment though).

I would recommend experimenting on what you list in the first step, and then assert against the current output first. Fuzzer/protest can be added after we're all happy with the change.

Yeah, it looks like there are already quite a lot of cargo tree integration tests, so I should definitely get those passing before I think about fuzz tests.

I cannot guarantee things will get merged, though it is still great to explore any possibility to reduce code complexity and discrepancy between cargo-tree and other build commands :)

That's fine - I need to build something like this for cargo quickbuild anyway, so it won't be wasted effort.

@davemilter
Copy link

I wonder is this why I got such strange output from cargo tree -d --format "{p} {f}" for such Cargo.toml:

[dependencies]
async-compression = "=0.3.15"
nmea = "=0.2.0"

[build-dependencies]
bindgen = { version = "0.64.0", default-features = false, features = ["runtime"] }
memchr v2.7.2 alloc,default,std
└── async-compression v0.3.15 default
    └── foo v0.1.0 (/private/tmp/foo) 

memchr v2.7.2 alloc,std
└── nom v7.1.3 alloc,std
    ├── cexpr v0.6.0 
    │   └── bindgen v0.64.0 runtime
    │       [build-dependencies]
    │       └── foo v0.1.0 (/private/tmp/foo) 
    └── nmea v0.2.0 default,std
        └── foo v0.1.0 (/private/tmp/foo) 

according to cargo tree each runtime dependency async-compression and nmea
will use memchr with different feature sets.

@weihanglo
Copy link
Member

/// Returns a list of nodes that are considered "duplicates" (same package
/// name, with different versions/features/source/etc.).
pub fn find_duplicates(&self) -> Vec<usize> {

This is how it caculates duplicates currently, so by the implementation a dependency with a different feature set is considered a duplicate.

@weihanglo weihanglo added S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. E-medium Experience: Medium labels Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug Command-tree E-medium Experience: Medium S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants