Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a VersionSet trait instead of Range #108

Merged
merged 16 commits into from
May 24, 2022
Merged

Use a VersionSet trait instead of Range #108

merged 16 commits into from
May 24, 2022

Conversation

mpizenberg
Copy link
Member

In this PR, we have now Range as an implementation of the newly introduced VersionSet trait, defined as follows:

/// Trait describing sets of versions.
pub trait VersionSet: Debug + Display + Clone + Eq {
    /// Version type associated with the sets manipulated.
    type V: Debug + Display + Clone + Ord;

    // Constructors
    /// Constructor for an empty set containing no version.
    fn empty() -> Self;
    /// Constructor for a set containing exactly one version.
    fn singleton(v: Self::V) -> Self;

    // Operations
    /// Compute the complement of this set.
    fn complement(&self) -> Self;
    /// Compute the intersection with another set.
    fn intersection(&self, other: &Self) -> Self;

    // Membership
    /// Evaluate membership of a version in this set.
    fn contains(&self, v: &Self::V) -> bool;

    // Automatically implemented functions ###########################

    /// Constructor for the set containing all versions.
    /// Automatically implemented as `Self::empty().complement()`.
    fn full() -> Self { ... }

    /// Compute the union with another set.
    /// Thanks to set properties, this is automatically implemented as:
    /// `self.complement().intersection(&other.complement()).complement()`
    fn union(&self, other: &Self) -> Self { ... }
}

Compilation and tests are passing while deactivating the serde feature. I wonder how to properly write the fact that VersionSet and its V associate type must be serializable when we have the serde feature so that OfflineDependencyProvider can be serializable.

// This currently does not compile with the "serde" feature
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
#[cfg_attr(feature = "serde", serde(transparent))]
pub struct OfflineDependencyProvider<P: Package, VS: VersionSet> {
    dependencies: Map<P, BTreeMap<VS::V, DependencyConstraints<P, VS>>>,
}

@Eh2406
Copy link
Member

Eh2406 commented Aug 8, 2021

In my Range trait branch I got it working some how, let me look. Here it is.

@mpizenberg
Copy link
Member Author

In my Range trait branch I got it working some how, let me look. Here it is.

Thanks, I will have a look at it!

@mpizenberg
Copy link
Member Author

Question: Why do we require Ord on the version type? And how may it impact automatic multi-dimension version sets defined based on Either, such as:

impl<L, R> RangeSet for EitherSet<L, R>
where
    L: RangeSet,
    R: RangeSet,
{
    type VERSION = Either<L::VERSION, R::VERSION>;

In resolve function, a BTreeSet of versions is used.

let mut added_dependencies: Map<P, BTreeSet<V>>

This one does not seem like is needed. It seems it is only used to check if we have already added dependencies of a given package to the solver incompatibilities. That could be replaced by a hash map, to require Hash instead of Ord, which is easier.

The OfflineDependencyProvider uses the order in a BTreeMap:

pub struct OfflineDependencyProvider<P: Package, VS: VersionSet> {
    dependencies: Map<P, BTreeMap<VS::V, DependencyConstraints<P, VS>>>,
}

Since this is only an implementation, we could add the Ord trait bound on VS::V here. But even then, the offline dependency provider relies on the order to provide the newest versions first. I can see it being a problem with the order automatically derived for Either versions since dimensions would be decoupled.

Side remark. It could also make sense to implement RangeSet/VersionSet (same thing, different branches) for the same type on each dimension: EitherSet<V,V>. This is because constraints on pre-releases are often expressed with one bound in each dimension such as 2.0-alpha <= v < 2.0. So we might want the full expressiveness of versions on both dimensions. And in such cases, if EitherSet::Version is just V, then versions in the BTreeMap will show up in the correct order.
But now the problem is how to implement contains for EitherSet? since we don't have the marker on the version, we don't know if we should look in the left or in right set of version ... The ideal would be to have a marker, but that the marker does not count for the order.

Anyway, a lot of food for thought.

@mpizenberg
Copy link
Member Author

In my Range trait branch I got it working some how, let me look. Here it is.

Thanks, that's what I was looking for, what a mouthful ^^ https://github.com/pubgrub-rs/pubgrub/blob/range-trait/src/solver.rs#L306-L314

#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
#[cfg_attr(feature = "serde", serde(bound(
    serialize = "R::VERSION: serde::Serialize, R: serde::Serialize, P: serde::Serialize", 
    deserialize = "R::VERSION: serde::Deserialize<'de>, R: serde::Deserialize<'de>, P: serde::Deserialize<'de>"
)))]
#[cfg_attr(feature = "serde", serde(transparent))]
pub struct OfflineDependencyProvider<P: Package, R: RangeSet> {
    dependencies: Map<P, BTreeMap<R::VERSION, DependencyConstraints<P, R>>>,
}

@Eh2406
Copy link
Member

Eh2406 commented Aug 8, 2021

Why do we require Ord on the version type?

To check if we have already added the dependencies. A hashmap would work just as well, but was not compatible with the 0.2 API.

OfflineDependencyProvider uses it to pick the newest version. So the Ord and why, should be on OfflineDependencyProvider.

And how may it impact automatic multi-dimension version sets?

The provided types will have some set of behaviors, probably:

  • OfflineDependencyProvider will pick based on Ord
  • Either will have a tag, and will have all Rs > any Ls

If a user does not want that behavure, they can change the DependencyProvider, the VersionSet, or the Version as needed. For Cargo's Semver impl of VersionSet, we don't need the tag (as we can call .pre.is_empty()). Different users can have different needs, then the question becomes "witch should we have built in".

@mpizenberg
Copy link
Member Author

mpizenberg commented Aug 8, 2021

Why do we require Ord on the version type?

To check if we have already added the dependencies. A hashmap would work just as well, but was not compatible with the 0.2 API.

Indeed. Since we are changing the API, it seems more appropriate to me to ask for Hash than to ask for Ord since hash enables more use cases. Then we can add an Ord trait bound on VersionSet::V for OfflineDependencyProvider. So I think I'll change that and see how that goes.

Now for the rest of this PR, I'd like to try the following (making a todo list for me to remember):

  • Add EitherSet<L,R> for version sets
  • Try if possible to have DoubleSet<V> with Ord bound and a bound to differentiate the dimension. This could make implementations of pre-release version sets easy I think.
  • Experiment with OfflineDependencyProvider and multi-dimensional version sets.
  • Rename Range into DiscreteSet and Version into DiscreteVersion. Relocate them?

@Eh2406
Copy link
Member

Eh2406 commented Aug 13, 2021

Also we should look into the Display bound, if it is only used in reporting then we may want to move the bound to there. Could be that our default reporter needs Display, but a user can have a Reporter with a different bound.

@mpizenberg
Copy link
Member Author

Also we should look into the Display bound

When I remove the bound, there are compiler errors for

  • errors.rs: errors use Display to include versions into error messages. (Same for Package by the way)
  • report.rs: for obvious reasons
  • partial_solution.rs: one branch in the satisfier() function returns an unreachable!() where terms are displayed

We could probably format the unreachable in partial_solution.rs with debug instead of display. For report, we could probably add trait bounds at a few places. For errors.rs I'm not sure what to do. I suppose we could use Debug but that would be a lot less nice?

Here is one of the errors:

    /// Error arising when the implementer of
    /// [DependencyProvider](crate::solver::DependencyProvider)
    /// returned an error in the method
    /// [get_dependencies](crate::solver::DependencyProvider::get_dependencies).
    #[error("Retrieving dependencies of {package} {version} failed")]
    ErrorRetrievingDependencies {
        /// Package whose dependencies we want.
        package: P,
        /// Version of the package for which we want the dependencies.
        version: VS::V,
        /// Error raised by the implementer of
        /// [DependencyProvider](crate::solver::DependencyProvider).
        source: Box<dyn std::error::Error>,
    },

@mpizenberg
Copy link
Member Author

Display is also very convenient for logging, though it's not something we have deeply investigated yet.

@Eh2406
Copy link
Member

Eh2406 commented Aug 13, 2021

Logging and errors seams like enough resin to keep it. Thanks for the investigation!

@raftario
Copy link

Is the checklist mentioned earlier in the thread still up to date ? I'd be happy to do some of the work required to help get this merged

@mpizenberg
Copy link
Member Author

Is the checklist mentioned earlier in the thread still up to date ? I'd be happy to do some of the work required to help get this merged

@raftario indeed, this is still relevant! Don't hesitate to try a few things if you are interested in investigating this. You might want to have a look at some conv in this zulip chanel for more context on that issue and checklist: https://rust-lang.zulipchat.com/#narrow/stream/260232-t-cargo.2FPubGrub/topic/pre-releases

A few remarks though, we want to proceed with cautious with this one and #104 since those are the two with the most flexibility and performance benefits but at the price of API changes. I've personally had a very busy time at work so could not spend much time on pubgrub. Plus we are currently preparing for Packaging Con where we'll be presenting this work in a few weeks :)

@Eh2406
Copy link
Member

Eh2406 commented Jan 12, 2022

I made a DoubleSet example. I have not had a chance to try it, so I don't know how good the ergonomics are.
One thing that was harder then expected, none and any make a new DoubleSet out of nothing. I was planning for F to just be F: Clone + Fn(&V::VERSION) -> bool and just call self.differentiate.clone() where needed. But none and any have no self to use.
So I needed to make a new trait for "tell me if this is a true or false" that did not need self. It is not impled on V::VERSION in case you want to be able to split one V::VERSION into different categories at compile time.

This can be extended from ; 2] to ; N] where const N usize, but I did not want to figure out how to initialize an array, so it is left for the reader.

@baszalmstra
Copy link
Contributor

Hi! I was wondering what the status of this PR is. I'm trying to use Pubgrup to resolve package constraints in a conda environment. Dependency constraints in conda can become quite complex through the matchspec format. A package can be queried not only by version but also by a lot of other constraints like its md5 hash or filename. Most commonly though it's a version specifier (or a combination of version specifiers) (e.g. >=1.2.3,<2.0.0) and a build string (an identification of how the package was build, e.g.: *_cpython). To be able to support this I assume that I need to use the features from this PR to create a VersionSet of matchspec.

I've started working on this in public repository here where for now I only support version specifiers, but I've been running into numerous crashes in pubgrub, especially this one:

index out of bounds: the len is 1 but the index is 1

on this location:

let dd = &satisfier_pa.dated_derivations[satisfier_index];

and this one:

add_derivation should not be called after a decision

on this location:

panic!("add_derivation should not be called after a decision")

I'm not sure if it's my code or that this PR simply isn't done yet.

Any help would be greatly appreciated! I would be happy to jump on a chat or video call too!

@mpizenberg
Copy link
Member Author

Hi @baszalmstra thanks for the feedback!

Regarding the crashes, are they happening with this specific branch or with the normal release? From feedback we've had from other people having crashes, in the normal case it turned out that crashes always happened because they did not strictly follow the requirements of the version traits (about order). But maybe this is something new. If that's new, I hope we can figure it out.

Regarding the state of this PR, it's in pause right now. I'm finishing my current work contract in a month from now, and since last year, end of 2021 I've been too busy to continue here and continue reviewing Jacob's commits. My plan is to take some time off between the end of my contract and a new contract so that I can spend a bit of time on my OSS projects, and here in particular!

I hope it's not too much of a blocker for you and you are able to get things done otherwise. That's also why we wanted to have a solid version 0.2 release with clearly stated limitations in the docs.

@baszalmstra
Copy link
Contributor

Hey @mpizenberg!

I get that, OSS works does take a significant amount of time! I really appreciate all the work you and the others put into this already!

I'm running the version-set branch. I've used the latest release from cargo before and did not encounter these issues. However, that was some time ago. I can see if I can rewrite the code and see if it happens there. The rest of the code changed a lot too, so I might have made a mistake somewhere else. Ill also add some more tests to verify that the version ordering works correctly.

It is kind of a blocker right now unfortunately, you wouldn't have any other tips on debugging these issues?

@baszalmstra
Copy link
Contributor

You pointed me in the right direction, I just implemented exactly that. I have a constraint that represents a DNF of structs that contains multiple constraints. The complement is then easily computed as well as the intersection, see: https://github.com/baszalmstra/rattler/blob/main/crates/rattler/src/match_spec_constraints.rs . I just implemented two simple constraints (version and build_number) but I think I can easily extend this to more.

I haven't encountered any crashes anymore but I did run into the problem that for some queries the solver goes into an infinite loop. I find it a bit hard to read the code. Would you have an idea why this could happen? You can try out my code from:

https://github.com/baszalmstra/rattler

Its very experimental currently but you can run it with: RUST_LOG=debug cargo run --release -- create.

Any help would be greatly appreciated!

@Eh2406
Copy link
Member

Eh2406 commented Mar 11, 2022

When we get this working right, an example should definitely be in this repo.
Does that command reliably reproduce the infinite loop? Is there any way to minimize the inputs?
I hope to take a look this weekend.

@baszalmstra
Copy link
Contributor

It does reliably reproduce the infinite loop. Im sure there is still a bug somewhere in my code too, but I find it very hard to find. Would it be possible to add some logging to the solver to be able to make debugging easier?

@Eh2406
Copy link
Member

Eh2406 commented Mar 14, 2022

So it is stuck in this loop https://github.com/baszalmstra/rattler/blob/main/crates/rattler/src/match_spec_constraints.rs#L143-L149 on an input where &self.groups.len() = 30, and indeed ~2^30 is going to be a lot of iterations. (Found by profiling the code when it got stuck, and then adding a dbg!(&self.groups.len()); to the method)

There has to be a non exponential way to combine these, but I have not thought of it yet.

@baszalmstra
Copy link
Contributor

Ah crap that was my fear. And that will obviously become worse if I add more constraints to it. Im going to think about this some more too. Hopefully, we can figure something out..

@baszalmstra
Copy link
Contributor

Mm I "optimized" the complement function by caching the result. There is still an outer loop stuck somewhere because after about 10-20 complements no "new" complements are computed.

@Eh2406
Copy link
Member

Eh2406 commented Mar 19, 2022

Caching can only solve the problem of a function being called too often. In this case the problem is that a single call takes too long. Putting a print before and after: https://github.com/baszalmstra/rattler/blob/main/crates/rattler/src/match_spec_constraints.rs#L123-L131
reveals that that's still where its stuck. I think this will be faster in practice, although not in theory.

However I am now seeing the stuck outer loop you reported. I do not have time to investigate now. Guessing we are going to find another property related to how we test for subsets.

@Eh2406
Copy link
Member

Eh2406 commented Mar 21, 2022

I did some more investigation but don't have a lot to show for it.
Something is creating an infinite loop in the resolver. I think It is not isolated to one small component, it is the system as a whole not making progress.
There are many things that it could be, but none that I have thought of match all the facts.
Whatever is causing this, we should probably have it in our test suite.
In order to make this a standalone test, and what would enable me to continue investigating, is to make a branch that reads the input from a file with only the fields that are used.
For example the file can store the requirements as a tuple of PubGrub requirements, instead of in the condo format your parsing. Similarly versions can be a tuple of the two fields the resolver reads.

@baszalmstra
Copy link
Contributor

@Eh2406 Ill try to create something like that!

@baszalmstra
Copy link
Contributor

I've started creating a smaller test case.

I've also created a PR to get my BoundedRange (yesyes, terrible name) merged because I think its much more flexible than the current Range while still allowing VersionSet to be implemented. #111 Curious about your thoughts!

@mpizenberg
Copy link
Member Author

Sorry for the long time without activity. I'm getting some attention back to this so that we can progress towards v0.3. I'm going to recap what happened here, with additional context from other discussions, so that we know where we are, and what we can do to move forward.

Changes in this PR

The goal here is to give more flexibility to implementors of dependency providers, by defining a trait VersionSet with minimum requirements to be compatible with the PubGrub version solving algorithm. Our previous Range type is now one implementation of that trait, which can be viewed as a helper when more constraints are imposed. The new trait will enable more flexible implementations, for example able to represent correctly pre-releases, which were incompatible with the set of constraints imposed on our previous Range type. The VersionSet is defined (roughly) as follows:

/// Trait describing sets of versions.
pub trait VersionSet: Debug + Display + Clone + Eq {
    /// Version type associated with the sets manipulated.
    type V: Debug + Display + Clone + Ord;

    // Constructors: empty(), singleton(Self::V)
    // Operations: complement(&self), intersection(&self, &Self)
    // Membership: contains(&self, &Self::V)

    // Automatically implemented functions ###################

    // Constructor: full()
    // Operations: union(&self, &Self)
}

One key difference is that the associate type VersionSet::V now only asks for an order, where previously, we explicitely asked for a minimal version, and a bump function to increment a version. This combined with the implementation freedom to handle sets how you want, instead of relying on our Range type gives a lot of possibilities. But with great power comes great responsability, meaning users have the responsability to implement their set operations correctly!

Questions and current choices

This change brings many questions on the current design decisions.

  1. Is it necessary to have the Display trait bounds on versions and sets?
  2. Is it necessary to have the Ord trait bound on the associate version type?
  3. Should we provide more ready-to-use implementations in addition to the current Range?

(1) The Display bound is very convenient for both error reporting (for derived error messages) and console logging. So we will keep it for now, and reevaluate later if a change is needed.

(2) The Ord bound is used at two places currently. First, it's used to store already picked versions in a btreeset for an important optimization. Second it's used in the offline dependency provider to sort versions. We could replace the Ord bound by a Hash bound for the first thing, but we'd have to add an Ord bound to the offline dependency provider anyway. In addition, an order is often at the core of the efficient handling of version sets operations, like in our Range operations. So it makes sense to keep the Ord bound for now, and reevaluate later if the need arises.

(3) It is important to provide at least one implementation of the VersionSet trait, both because it serves as an example, and for property tests. Range is a good first candidate for both since it's one of the simplest and most efficient implementations. Now the question is how many more implementations? For now, 3 directions have been mentionned. One consists in using continuous version sets, with bounded intervals instead of the current discrete version constraints imposed on the version type used with Range. Inspiration for this can be taken from PR #99 (itself inspired by PhotonQuantum fork using the ranges crate) and from PR #111 of @baszalmstra. Another direction consists in providing an implementation ready for version types with pre-releases. Finally, another direction consists in building a base with EitherSet(VS1, VS2) and DoubleSet(VS) helpers of multi-dimensions version sets that can be composed to easily implement any kind of multi-dimensions version sets, with automatic derivations of the trait.

The EitherSet and DoubleSet feels powerful and empowering but it also has drawbacks. Typically, the EitherSet(VS1, VS2) would provide automatic trait implementation where version sets would be combined of two independent version types. This can typically be useful for a solving handling packages of two different ecosystems like Rust & C++, or Python and C++. So it could make sense but it's probably better to first provide one showcasing concrete example and see what needs arise from the community. A DoubleSet(VS) helper could be useful to automatically derive the trait for version types that embed multiple incompatible spaces that can be differentiated in a single version type. For example pre-release tags are usually embeded in the same type definition (as in SemVer spec) but they can be used to define version sets that do not obey the strict rules of ordering if they shared the same space with normal versions. For example v2.0-alpha is lower than v2.0 yet is not considered to be contained within the v < 2.0 requirement. So the DoubleSet(VS) generic helper sounds like a good idea for composition, but in practice, it brings wrong ergonomics. This is because we would need the inner VS type to also implement the trait. And in doing so, we may lead them to use directly their VS type with pubgrub while their VS::V semantics are wrong.

For these reasons, ergonomics + unknowns, in addition to the fact that this is something that could still be implemented later in a minor release, it makes more sense to not consider implementations of generic EitherSet and DoubleSet for now. Instead, I think we should provide concrete examples, explained in the guide, for both the pre-release case, and the multi ecosystem case.

Now regarding the continuous, bounded interval trait implementation, my feeling is that we do not need to replace the current Range impl with this. The continuous bounded approach is more generic than the discrete one so it could replace it entirely in theory. My feeling is that it will have a non-negligeable performance cost so this needs to be evaluated first. A non-blocking way forward is to keep the current Range impl, and to provide the continuous bounded impl as another more generic helper. This can be done directly for v0.3, or kept as an extension for v0.3.1 to not block the release of v0.3.

Other small remarks

We should probably rename the Range type to something more precise like DiscreteSet and rename Version into DiscreteVersion. We should also move the OfflineDependencyProvider outside of the solver module and into its own module to be closer to what other external implementation modules would look like. All this can be done in another PR but should happen before v0.3 to limit breaking changes.

We will also need an "Upgrade from v0.2 to v0.3" section in the guide, as well as an explanation for how to update RON files from the old to the new types. Finally, we'll need a guide section on "How to test my dependency provider implementation?" for people implementing their own VersionSet or we will have to handle many GitHub issues related to people having wrong implementations of their set operations.

We may also want to reconsider the "interior mutability" pattern we are imposing on users due to not using &mut in our types?

Summary

I hope I have not forgotten important info or made big mistakes, don't hesitate to correct me if that's the case or to voice your concerns if you don't agree with some of my points. So I propose we move forward as follows, let me know what you think.

  1. Mark this PR ready for review and wait for your approval to merge
  2. Write a guide section + reference impl in the advanced_dependency_providers repo for pre-release example
  3. Update the Priority queue PR perf!: use a priority queue #104 to fit with the new trait system
  4. Renames of the ranges and version types, move offline deps provider to it's own module
  5. Update the guide to match v0.3 of the repo (Trait + renames + Priority queue API changes)
  6. Write a guide section for the v0.2 -> v0.3 upgrade
  7. Complement the guide section about property testing with info for people implementing on their own VersionSet
  8. Publish version v0.3

Many of the above points can be done in parallel after this PR is merged (1). And after publishing v0.3 we can refocus on the nice-to-have and new code. In no particular order:

  • Write a guide section + reference impl in the advanced_dependency_providers repo for multi-ecosystem example
  • Provide a continous impl with interval bounds
  • Write an example script for RON upgrade

@mpizenberg mpizenberg marked this pull request as ready for review May 21, 2022 16:34
@Eh2406
Copy link
Member

Eh2406 commented May 21, 2022

First off, thank you for the summary! You said a lot of really valuable stuff. For the sake of brevity I am not going to mention all the times I agree with what you said.
Also I looked over the code and it seems good. I'm not seeing anything that obviously needs to be fixed before merge, and if I missed something we can always do an additional PR.

I think the decision about how much this repo shuld support continuous ranges, can better be had in a separate thread. This conversation is already very long. Let's merge this PR and open separate issues for each of the points you brought up! (And all the other valuable conversations that have happened in this thread, like tags + bounds and how to test.)

@baszalmstra
Copy link
Contributor

Thanks for the detailed summary @mpizenberg! Getting this MR merged sounds great! :)

@mpizenberg
Copy link
Member Author

Let's merge this PR and open separate issues for each of the points

@Eh2406 did you forget to check the required review to be able to merge this PR or are you planning on doing it later?

@mpizenberg mpizenberg requested a review from Eh2406 May 24, 2022 12:56
Copy link
Member

@Eh2406 Eh2406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot what the ceremony was in this repo. Sorry.

@mpizenberg mpizenberg merged commit f9c8238 into dev May 24, 2022
@mpizenberg mpizenberg deleted the version-set branch May 24, 2022 15:12
@mpizenberg
Copy link
Member Author

No worry, yeah we still have 1 review requirement before merging stuff.
That's one thing done! Now onto the rest of v0.3 :)

@mpizenberg mpizenberg mentioned this pull request May 23, 2023
zanieb pushed a commit to astral-sh/pubgrub that referenced this pull request Nov 8, 2023
* refactor: introduce a trait for sets of versions

* refactor: port report and incompatibility to version set

* refactor: port errors to version set

* refactor: port partial solution to version set

* refactor: move DependencyConstraints to type_aliases

* refactor: port core to version set

* refactor: port solver to version set

* refactor: replace old modules with ones based on version_set

* refactor: update tests to version_set

* feat: add serde bounds to OfflineDependencyProvider

* refactor: update proptest.rs to VersionSet

* docs: fix links

* refactor: allow clippy type_complexity

* Small docs changes
This was referenced Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants