Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement VersionReq union, intersection, and simplification #223

Closed
wants to merge 6 commits into from

Conversation

jonhoo
Copy link

@jonhoo jonhoo commented Dec 9, 2020

This PR adds VersionReq::union and VersionReq::intersection, and thus fixes #170.

Since the naive implementation tends to produce fairly verbose bounds, it also adds a method that "simplifies" a VersionReq. It does so by first reducing each Range to the smallest range that matches all the range's predicates, and then merging overlapping ranges. For example, the VersionReq:

[ [1, <1.5], [1.2] ]

which is really

[ [>=1.0.0, <2.0.0, <1.5.0], [>=1.2.0,<2.0.0] ]

is simplified to

[ [>=1.2.0, <1.5.0] ]

I did not add automatic simplification after a call to parse, but that might be worthwhile.

I would love for this to have significantly more test-cases, so if you can think of any, please propose them here.

Also, if you have an idea for how to avoid panicking when trying to combine VersionReqs with different compats, please speak up!

@jonhoo
Copy link
Author

jonhoo commented Dec 9, 2020

/cc @Laegluin

@Eh2406
Copy link
Contributor

Eh2406 commented Dec 11, 2020

versions.zip is a zip file containing a file with all 12665 versions and all 9427 requirements published on crates.io. It doesn't have all the forms that is allowed by this crate but it is a real data sample.

self
}

fn simplify(&mut self) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not pub.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think it should be actually. I think simplify should be performed internally when it makes sense to do so (like after a call to intersection). It's not clear there's value in users being able to call it randomly.

@Eh2406
Copy link
Contributor

Eh2406 commented Dec 12, 2020

I ran some smoke tests over that data set. It found too many for me to summarize. The ugly hack of code I threw together was:

Cargo.toml

crates-index={ git = "https://github.com/frewsxcv/rust-crates-index"}
semver = { git = "https://github.com/jonhoo/semver", branch = "merge-ranges" }

main.rs

fn main() {
    let index = crates_index::BareIndex::new_cargo_default();
    let index = index.open_or_clone().unwrap();

    let mut req_set = BTreeSet::new();
    let mut ver_set = BTreeSet::new();

    for crt in index.crates() {
        for ver in crt.versions() {
            if let Ok(v) = semver::Version::parse(ver.version()) {
                ver_set.insert(v);
            } else {
                dbg![ver.version()];
            }
            for dep in ver.dependencies() {
                if let Ok(r) = semver::VersionReq::parse(dep.requirement()) {
                    req_set.insert(r);
                } else {
                    dbg![dep.requirement()];
                }
            }
        }
    }

    for (i, req1) in req_set.iter().rev().enumerate() {
        dbg!((req1.to_string(), i, i as f32 / req_set.len() as f32));
    //     let mut simp = req1.clone();
    //     simp.simplify();
    //     for ver in &ver_set {
    //         if req1.matches(ver) != simp.matches(ver) {
    //             dbg![(req1.to_string(), simp.to_string(), ver.to_string())];
    //         }
    //     }
        for req2 in req_set.range(req1..) {
            let int1 = req1.clone().intersection(req2);
            let int2 = req2.clone().intersection(req1);
            let uni1 = req1.clone().union(req2);
            let uni2 = req2.clone().union(req1);
            for ver in &ver_set {
                let int = req1.matches(ver) && req2.matches(ver);
                if int != int1.matches(ver) {
                    dbg![(
                        req1.to_string(),
                        req2.to_string(),
                        ver.to_string(),
                        int1.to_string(),
                        int
                    )];
                }
                if int != int2.matches(ver) {
                    dbg![(
                        req1.to_string(),
                        req2.to_string(),
                        ver.to_string(),
                        int2.to_string(),
                        int
                    )];
                }
                let uni = req1.matches(ver) || req2.matches(ver);
                if uni != uni1.matches(ver) {
                    dbg![(
                        req1.to_string(),
                        req2.to_string(),
                        ver.to_string(),
                        uni1.to_string(),
                        uni
                    )];
                }
                if uni != uni2.matches(ver) {
                    dbg![(
                        req1.to_string(),
                        req2.to_string(),
                        ver.to_string(),
                        uni2.to_string(),
                        uni
                    )];
                }
            }
        }
    }
}

@jonhoo
Copy link
Author

jonhoo commented Dec 12, 2020

Oh, nice! Could you share some of the problem cases you came across?

@Eh2406
Copy link
Contributor

Eh2406 commented Dec 12, 2020

Here is the zip file of the first ~580k cases cleaned into a csv.

@jonhoo
Copy link
Author

jonhoo commented Dec 15, 2020

@Eh2406 Hmm, the first couple of cases seem suspicious if I'm reading them right:

type, req1, req2, ver, combined, correct
"int1", "<0.3.0-rc.17", "<0.4.0, >=0.1.0", "0.3.0-1", ">=0.1.0, <0.3.0-rc.17", false,
"int2", "<0.3.0-rc.17", "<0.4.0, >=0.1.0", "0.3.0-1", ">=0.1.0, <0.3.0-rc.17", false,
"uni1", "<0.3.0-rc.17", "<0.4.0, >=0.1.0", "0.3.0-1", "<0.4.0", true,
"uni2", "<0.3.0-rc.17", "<0.4.0, >=0.1.0", "0.3.0-1", "<0.4.0", true,

The intersection of <0.3.0-rc.17 and <0.4.0, >=0.1.0 is indeed >=0.1.0, <0.3.0-rc.17, so it seems like my simplification code is doing the right thing. Since correct=false I assume the expectation is that 0.3.0-1 does not match <0.3.0-rc.17, but in that case it also should not match >=0.1.0, <0.3.0-rc.17. Yet the failing test suggests that it does. This indicates to me that there's actually a bug in the matching code.

For the union case: <0.3.0-rc.17 union <0.4.0, >=0.1.0 does indeed simplify to <0.4.0, which 0.3.0-1 should match. Which again suggests to me that there's something wrong with the matching code.

Specifically, what seems to be happening is that <0.X (without a prelease) never matches 0.Y-pre even when Y < X. That would explain both of the oddities above. In the intersection case, the <0.4 is removed, so the case starts matching (whereas the <0.4 in the original prevented it from matching). In the union case, the <0.3-rc.17 is removed in favor of the <0.4, so the case stops matching (whereas the <0.3-rc.17 in the original did match).

I think this is essentially the same as #172, and if I understand Steve correctly there, pre-release versions should only be selected if the major-minor-patch the pre-release is for is explicitly mentioned in the version requirements. That is, <0.4 should not match any pre-release of 0.3, whereas <0.3-pre will match any pre-release of 0.3. If that is indeed the case, that certainly makes the case for unions and intersections much more complicated, since it would mean that <0.4 is not strictly more general than <0.3-pre, and thus we have to merge ranges not globally across the version space, but within each major version.

@Eh2406
Copy link
Contributor

Eh2406 commented Dec 15, 2020

Indeed your last paragraph describes it well. I think this is going to be a tricky problem. To belabor the examples:

<0.4.0, >=0.1.0 does not match 0.3.0-1 as the requirement does not include prereleases. So the intersection of <0.3.0-rc.17 and <0.4.0, >=0.1.0, is (I think) >=0.1.0, <0.3.0.
Similarly the union would have to be something like <0.3.0-rc.17 || <0.4.0.

Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I would prefer if this logic were provided in a separate crate for now. I am not necessarily opposed to bringing intersection and simplification into the semver crate eventually, but as you found in #223 (comment) it's going to be a lot more complicated than this PR, and I don't feel that here is going to be the best place to iterate on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement intersection for two VersionReq
3 participants