Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare matching behavior against 0.10 (currently used by Cargo and crates.io) #237

Closed
dtolnay opened this issue May 25, 2021 · 8 comments
Closed

Comments

@dtolnay
Copy link
Owner

dtolnay commented May 25, 2021

Request from @Eh2406:

There is a file here with all the versions and requirements from crates.io. It would be so comforting to compare that the same ones parse and match between 0.10 and 1.0. If there is time before the 1.0 release, I may try to do that, if someone does not beat me to it.

@dtolnay
Copy link
Owner Author

dtolnay commented May 25, 2021

I ended up just using the most recent db dump (2021-05-25-020145) from https://static.crates.io/db-dump.tar.gz.

// [dependencies]
// csv = "1.1"
// flate2 = "1.0"
// memmap = "0.7"
// semver-new = { package = "semver", version = "1.0.0-rc.2" }
// semver-old = { package = "semver", version = "0.10" }
// serde = { version = "1.0", features = ["derive"] }
// tar = "0.4"

use csv::StringRecord;
use flate2::read::GzDecoder;
use memmap::Mmap;
use serde::Deserialize;
use std::collections::{BTreeMap as Map, BTreeSet as Set};
use std::fs::File;
use tar::Archive;

const DB_DUMP: &str = "/home/david/Downloads/db-dump.tar.gz";

type CrateId = u32;
type StringVersion = String;
type StringVersionReq = String;

#[derive(Deserialize)]
struct Version {
    crate_id: CrateId,
    num: StringVersion,
}

#[derive(Deserialize, Eq, PartialEq, Ord, PartialOrd)]
struct Dependency {
    crate_id: CrateId,
    req: StringVersionReq,
}

fn main() {
    let mut versions: Map<CrateId, Vec<StringVersion>> = Map::new();
    let mut dependencies: Set<Dependency> = Set::new();

    let file = File::open(DB_DUMP).unwrap();
    let mmap = unsafe { Mmap::map(&file) }.unwrap();
    let mut archive = Archive::new(GzDecoder::new(mmap.as_ref()));
    for entry in archive.entries().unwrap() {
        let entry = entry.unwrap();
        let path = entry.path().unwrap();
        if path.ends_with("versions.csv") {
            let mut csv = csv::Reader::from_reader(entry);
            let headers = csv.headers().unwrap().clone();
            let mut record = StringRecord::new();
            while csv.read_record(&mut record).unwrap() {
                let record: Version = record.deserialize(Some(&headers)).unwrap();
                versions
                    .entry(record.crate_id)
                    .or_insert_with(Vec::new)
                    .push(record.num);
            }
        } else if path.ends_with("dependencies.csv") {
            let mut csv = csv::Reader::from_reader(entry);
            let headers = csv.headers().unwrap().clone();
            let mut record = StringRecord::new();
            while csv.read_record(&mut record).unwrap() {
                let record: Dependency = record.deserialize(Some(&headers)).unwrap();
                dependencies.insert(record);
            }
        }
    }

    dbg!(versions.len());
    dbg!(dependencies.len());

    for dep in &dependencies {
        let req_old = semver_old::VersionReq::parse(&dep.req).unwrap();
        let req_new = compat_parse_version_req(&dep.req).unwrap();

        for version in &versions[&dep.crate_id] {
            let version_old = semver_old::Version::parse(version).unwrap();
            let version_new = compat_parse_version(version).unwrap();

            let old_matches = req_old.matches(&version_old);
            let new_matches = req_new.matches(&version_new);
            if old_matches != new_matches {
                eprintln!(
                    "\"{}\".matches(\"{}\")  old={} new={}",
                    dep.req, version, old_matches, new_matches,
                );
            }
        }
    }
}

fn compat_parse_version(string: &str) -> Result<semver_new::Version, semver_new::Error> {
    match string.parse() {
        Ok(version) => Ok(version),
        Err(err) => {
            let deprecated = match string {
                "0.0.1-001" => "0.0.1-1",
                "0.3.0-alpha.01" => "0.3.0-alpha.1",
                "0.4.0-alpha.00" => "0.4.0-alpha.0",
                "0.4.0-alpha.01" => "0.4.0-alpha.1",
                _ => return Err(err),
            };
            Ok(deprecated.parse().unwrap())
        }
    }
}

fn compat_parse_version_req(string: &str) -> Result<semver_new::VersionReq, semver_new::Error> {
    match string.parse() {
        Ok(req) => Ok(req),
        Err(err) => {
            let deprecated = match string {
                "^0-.11.0" => "^0.11.0",
                "^0.1-alpha.0" => "^0.1.0-alpha.0",
                "^0.51-oldsyn" => "^0.51.0-oldsyn",
                "~2.0-2.2" => ">=2.0, <=2.2",
                _ => return Err(err),
            };
            Ok(deprecated.parse().unwrap())
        }
    }
}

@dtolnay
Copy link
Owner Author

dtolnay commented May 25, 2021

Output:

[src/main.rs:59] versions.len() = 61556
[src/main.rs:60] dependencies.len() = 141110
"~2.0-2.2".matches("2.1.1")  old=false new=true
"~2.0-2.2".matches("2.1.2")  old=false new=true
"~2.0-2.2".matches("2.1.4")  old=false new=true
"~2.0-2.2".matches("2.1.5")  old=false new=true
"~2.0-2.2".matches("2.2.1")  old=false new=true
"~2.0-2.2".matches("2.2.2")  old=false new=true
"~2.0-2.2".matches("2.2.0")  old=false new=true
"~2.0-2.2".matches("2.2.4")  old=false new=true
"~2.0-2.2".matches("2.1.0")  old=false new=true
"~2.0-2.2".matches("2.1.3")  old=false new=true
"~2.0-2.2".matches("2.2.3")  old=false new=true
"~2.0-2.2".matches("2.2.5")  old=false new=true
"^0-.11.0".matches("0.8.1")  old=true new=false
"^0-.11.0".matches("0.7.1")  old=true new=false
"^0-.11.0".matches("0.10.0")  old=true new=false
"^0-.11.0".matches("0.5.0")  old=true new=false
"^0-.11.0".matches("0.7.2")  old=true new=false
"^0-.11.0".matches("0.12.4")  old=true new=false
"^0-.11.0".matches("0.8.0")  old=true new=false
"^0-.11.0".matches("0.1.12")  old=true new=false
"^0-.11.0".matches("0.4.1")  old=true new=false
"^0-.11.0".matches("0.2.1")  old=true new=false
"^0-.11.0".matches("0.2.2")  old=true new=false
"^0-.11.0".matches("0.12.1")  old=true new=false
"^0-.11.0".matches("0.12.2")  old=true new=false
"^0-.11.0".matches("0.7.0")  old=true new=false
"^0-.11.0".matches("0.4.0")  old=true new=false
"^0-.11.0".matches("0.1.9")  old=true new=false
"^0-.11.0".matches("0.6.2")  old=true new=false
"^0-.11.0".matches("0.12.3")  old=true new=false
"^0-.11.0".matches("0.3.0")  old=true new=false
"^0-.11.0".matches("0.6.0")  old=true new=false
"^0-.11.0".matches("0.6.1")  old=true new=false
"^0-.11.0".matches("0.9.0")  old=true new=false
"^0-.11.0".matches("0.4.2")  old=true new=false
"^0-.11.0".matches("0.1.8")  old=true new=false
"^0-.11.0".matches("0.12.0")  old=true new=false

@dtolnay dtolnay closed this as completed May 25, 2021
@Eh2406
Copy link
Contributor

Eh2406 commented May 25, 2021

Thank you so much. I am comfortable with that level of brakeage!

This was referenced May 26, 2021
bors added a commit to rust-lang/cargo that referenced this issue May 27, 2021
Update to semver 1.0.0

I am working on a 1.0.0 of the `semver` crate some time this week. It would be good to confirm Cargo will be able to use it, beforehand!

It's a from-scratch rewrite, but dtolnay/semver#237 has code to compare against 0.10.0 (currently used by Cargo) how every possible version requirement currently published to crates.io matches against every possible crate version. The differences are all broken syntax like `^0-.11.0` previously parsing with ".11.0" as a pre-release string (which is invalid, because pre-release are not allowed to contain empty dot-separated identifiers) and `~2.0-2.2` previously parsing with "2.2" as a pre-release string, when the user almost certainly meant `>=2.0, <=2.2`. I'm not sure how much of those you want to add code into Cargo to preserve behavior, but I would be happy to do it.
@Eh2406
Copy link
Contributor

Eh2406 commented Jun 4, 2021

I am a little confused, how did #247 not get caught by this check?

@ehuss
Copy link

ehuss commented Jun 4, 2021

I ran my test against the index, but crates.io normalized the version req to 0.11.*. I didn't test against the actual Cargo.toml files. I can run a test on that.

@Eh2406
Copy link
Contributor

Eh2406 commented Jun 4, 2021

That makes sense. I just wanted to understand.

@ehuss
Copy link

ehuss commented Jun 4, 2021

Here is an updated list. My clone of crates.io is somewhat out of date, so this is current as of about a year ago. Here is the script I used. I think most of them are due to the x thing, which I never knew about!

riscv-regs-0.3.0/Cargo.toml.orig" "0.3.0" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
riscv-regs-0.3.0/Cargo.toml" "0.3.0" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
tari_comms-0.2.0/Cargo.toml.orig" "0.2.0" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
tari_comms-0.2.0/Cargo.toml" "0.2.0" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
rand_funcs-0.1.3/Cargo.toml.orig" "0.1.3" -> 0.x.x
    error: unexpected character 'x' while parsing minor version number
rand_funcs-0.1.3/Cargo.toml" "0.1.3" -> 0.x.x
    error: unexpected character 'x' while parsing minor version number
tokio-file-0.5.2/Cargo.toml.orig" "0.5.2" -> 0.*.1, >= 0.12.1
    error: unexpected character after wildcard in version req
tokio-file-0.5.2/Cargo.toml" "0.5.2" -> 0.*.1, >= 0.12.1
    error: unexpected character after wildcard in version req
easy-plugin-0.11.8/Cargo.toml" "0.11.8" -> 0.*.0
    error: unexpected character after wildcard in version req
easy-plugin-0.11.8/Cargo.toml" "0.11.8" -> 0.*.0
    error: unexpected character after wildcard in version req
easy-plugin-0.11.8/Cargo.toml" "0.11.8" -> 0.*.0
    error: unexpected character after wildcard in version req
easy-plugin-0.11.8/Cargo.toml" "0.11.8" -> 0.*.0
    error: unexpected character after wildcard in version req
cortex-a-3.0.4/Cargo.toml.orig" "3.0.4" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
cortex-a-3.0.4/Cargo.toml" "3.0.4" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
strfile-0.1.2/Cargo.toml" "0.1.2" -> 0.3.1.3
    error: expected comma after patch version number, found '.'
curved_gear-0.1.0/Cargo.toml" "0.1.0" -> 0.x
    error: unexpected character 'x' while parsing minor version number
tma-0.1.1/Cargo.toml.orig" "0.1.1" -> 0-.11.0
    error: expected comma after major version number, found '-'
tma-0.1.1/Cargo.toml" "0.1.1" -> 0-.11.0
    error: expected comma after major version number, found '-'
derive-error-chain-0.11.2/Cargo.toml.orig" "0.11.2" -> 0.4.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml.orig" "0.11.2" -> 0.6.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml.orig" "0.11.2" -> 0.14.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml.orig" "0.11.2" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml.orig" "0.11.2" -> 0.11.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml" "0.11.2" -> 0.4.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml" "0.11.2" -> 0.6.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml" "0.11.2" -> 0.14.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml" "0.11.2" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
derive-error-chain-0.11.2/Cargo.toml" "0.11.2" -> 0.11.x
    error: unexpected character 'x' while parsing patch version number
register-0.5.1/Cargo.toml.orig" "0.5.1" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
register-0.5.1/Cargo.toml" "0.5.1" -> 0.5.x
    error: unexpected character 'x' while parsing patch version number
udi-0.0.1/Cargo.toml.orig" "0.0.1" -> 0.11.x
    error: unexpected character 'x' while parsing patch version number
udi-0.0.1/Cargo.toml" "0.0.1" -> 0.11.x
    error: unexpected character 'x' while parsing patch version number
easy-plugin-parsers-0.11.8/Cargo.toml" "0.11.8" -> 0.*.0
    error: unexpected character after wildcard in version req
easy-plugin-parsers-0.11.8/Cargo.toml" "0.11.8" -> 0.*.0
    error: unexpected character after wildcard in version req
easy-plugin-parsers-0.11.8/Cargo.toml" "0.11.8" -> 0.*.0
    error: unexpected character after wildcard in version req
easy-plugin-plugins-0.9.1/Cargo.toml" "0.9.1" -> 0.*.0
    error: unexpected character after wildcard in version req
easy-plugin-plugins-0.9.1/Cargo.toml" "0.9.1" -> 0.*.0
    error: unexpected character after wildcard in version req
qemu-exit-0.1.2/Cargo.toml.orig" "0.1.2" -> 0.7.x
    error: unexpected character 'x' while parsing patch version number
qemu-exit-0.1.2/Cargo.toml" "0.1.2" -> 0.7.x
    error: unexpected character 'x' while parsing patch version number
mio-aio-0.4.1/Cargo.toml.orig" "0.4.1" -> 0.*.1, >= 0.12.1
    error: unexpected character after wildcard in version req
mio-aio-0.4.1/Cargo.toml" "0.4.1" -> 0.*.1, >= 0.12.1
    error: unexpected character after wildcard in version req

@Eh2406
Copy link
Contributor

Eh2406 commented Jun 4, 2021

Thanks for doing that! It dose not look so bad!

Repository owner locked and limited conversation to collaborators Jan 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants