Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Version set unions as solvable requirements #56

Merged
merged 9 commits into from
Aug 5, 2024

Conversation

eviltak
Copy link
Contributor

@eviltak eviltak commented Jul 29, 2024

This PR introduces the concept of version set unions and allows them to be used to specify solvable requirements that can be fulfilled by more than one version set (belonging to more than one package).

The change is facilitated by the introduction of a Requirement type, which is used instead of VersionSetIds to specify a required dependency of a solvable. A requirement can either be comprised of a single version set (a VersionSetId), or multiple version sets (a VersionSetUnionId). This allows existing code bases to continue using a single version set (VersionSetId) as a requirement specification without ever having to touch version set unions (VersionSetUnionId).

Breaking changes

This PR introduces breaking changes - the main ones being the following:

  • Added the Interner::version_sets_in_union trait method
  • Changed the type of the requirements field of KnownDependencies from Vec<VersionSetId> to Vec<Requirement>.
  • Renamed the requirements field of DependencySnapshot to version_sets
  • Added the version_set_unions field to DependencySnapshot (only affects those who are constructing DependencySnapshot instances manually)

API migration

Migrating existing depending projects that do not rely on version set unions to the new API is quite straightforward.

  • Add a placeholder implementation for version_sets_in_union in impls of Interner. It is perfectly fine for the implementation to be unimplemented!, since if version set unions are unused this method will never be called.
  • Modify implementations of DependencyProvider::get_dependencies to convert the VersionSetIds used in the requirements field of KnownDependencies to Requirements. Since Requirement implements From<VersionSetId>, this is as simple as calling From::from or Into::into on each VersionSetId.
  • For those using DependencySnapshot, the requirements field has been renamed to version_sets; all other functionality remains the same.

Copy link
Contributor

@baszalmstra baszalmstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

I think it could be useful to modify the decide function as well to be aware of multiple version sets in a single requirement. That way you get to conflicts of individual packages faster.

src/internal/mapping.rs Outdated Show resolved Hide resolved
src/utils/pool.rs Show resolved Hide resolved
tests/solver.rs Outdated Show resolved Hide resolved
@eviltak eviltak force-pushed the union-requirements branch 2 times, most recently from 8d2d378 to 68e5ea7 Compare July 29, 2024 14:46
@eviltak
Copy link
Contributor Author

eviltak commented Jul 29, 2024

I think it could be useful to modify the decide function as well to be aware of multiple version sets in a single requirement. That way you get to conflicts of individual packages faster.

@baszalmstra Can you elaborate on how this might work? As far as I understand, decide currently picks the first selectable candidate in an undecided clause that has the least number of undecided candidates. The candidates list contains the candidates for all the version sets in the requirement, so it should already take into account this case.

@baszalmstra
Copy link
Contributor

I think you only have to look at the first versionset in a requirement that still has undecided candidates. The idea is to force conflicts as early as possible to reduce backtracking. By only looking at the first versionset instead of the whole requirement you find smaller sets faster is my theory.

E.g.

foo ==1 | bar >0,<1000

has a large number of candidates but foo==1 only has 1. We want to select foo=1 first to make sure that if there are other conflicting requirements we find them soon.

Does that make sense?

@eviltak
Copy link
Contributor Author

eviltak commented Jul 29, 2024

I think that makes sense. Would sorting the candidates list for a version set union such that the version sets with fewer candidates come earlier suffice (in SolverCache::get_or_cache_sorted_candidates), or would you rather Solver::decide look at the candidates of the smaller version sets first?

@baszalmstra
Copy link
Contributor

Changing the order would also change the outcome of the solution. I can imagine that the order of the union is therefore important. So I think this order should recide in the decide function itself.

@baszalmstra
Copy link
Contributor

you also cannot know the number of available candidates beforehand because it depends on previous decisions

@eviltak eviltak force-pushed the union-requirements branch 2 times, most recently from a75c683 to 6b248e2 Compare July 30, 2024 01:26
@eviltak eviltak force-pushed the union-requirements branch from 6b248e2 to 05ac114 Compare July 30, 2024 01:32
@eviltak
Copy link
Contributor Author

eviltak commented Jul 30, 2024

I seemed to have misunderstood your suggestion at first. Having re-read it a few times, I now think I have understood what you are suggesting - to consider the size of only the first version set with available candidates in a requirement in the "least available candidates" heuristic of Solver::decide.

If my new understanding of your suggestion is correct (and please correct me if I am still wrong), I fail to see how this will reduce the amount of backtracking. In the example you gave of foo == 1 | bar >0,<1000, though there is only 1 candidate for foo that fulfills the requirement, there are 999 candidates of bar that will also fulfill the requirement. If selecting foo=1 causes conflicts, there are still the 999 candidates of bar to be considered. In fact, it would be easier to first try and solve other clauses with fewer available candidates, in case one of them needs a version of foo other than 1 (i.e. conflicts with foo=1). Then when we finally choose to decide a solvable for this clause, foo=1 would have already been set to false, letting us pick one of the bar candidates instead to fulfill this requirement (which is the same justification for the heuristic when requirements can only span a single version set).

For example, consider a solution which needs two requirements foo == 1 | bar >0,<1000 and foo >1,<5. With the current logic, the latter requirement will be decided first (since it has 4 candidates vs. the former's 1000), setting say foo=4 to true. Then the former requirement will be decided, at which point foo=1 is no longer a viable candidate, and hence a solvable for bar is chosen. With your proposal (assuming I understood it correctly), we would decide the former requirement first (since the first version set, foo == 1, only has 1 candidate) and choose foo=1 to satisfy it. However, after this choice, the propagation step would report a conflict with the latter requirement, and we would learn that foo=1 cannot be part of the solution and have to backtrack.

@baszalmstra
Copy link
Contributor

I now think I have understood what you are suggesting - to consider the size of only the first version set with available candidates in a requirement in the "least available candidates" heuristic of Solver::decide.

Indeed.

Maybe instead of theorizing about the implications, we should just benchmark it? I assume you have a large set of cases that you could test this with?

@eviltak
Copy link
Contributor Author

eviltak commented Jul 31, 2024

I don't have test cases on hand, but I can definitely come up with some. How are you currently benchmarking and comparing the performance of various solution strategies? Do you look at the number of solver iterations, the number of times it backtracks, or some other metrics? If it works for you, I think it would be better to merge this PR as is and tackle this question in a separate issue/PR.

Also, would you be able to look at the updated C++ bindings? I added support for using Requirements as I saw fit, but the API is unfortunately not as ergonomic as it is in Rust because of the lack of support of sum types in C++.

@baszalmstra
Copy link
Contributor

I don't have test cases on hand, but I can definitely come up with some. How are you currently benchmarking and comparing the performance of various solution strategies? Do you look at the number of solver iterations, the number of times it backtracks, or some other metrics? If it works for you, I think it would be better to merge this PR as is and tackle this question in a separate issue/PR.

You can create a snapshot of a couple of cases with your DependencyProvider implementation. We can include those snapshots in this repository and use them for tests and benchmarks.

I have a very large snapshot locally of the entire conda ecosystem (using this tool), I then solve for each individual package in that set (20k+) and measure the performance using this tool included in the repo. You can then see a nice distribution of the solve times and compare them against another implementation.

Also, would you be able to look at the updated C++ bindings? I added support for using Requirements as I saw fit, but the API is unfortunately not as ergonomic as it is in Rust because of the lack of support of sum types in C++.

I can take a look at that, but next week at the earliest.

Thanks again for this PR, you are doing amazing work! 🚀

@eviltak
Copy link
Contributor Author

eviltak commented Aug 2, 2024

That's great information, thanks a lot! I should be able to generate real-world use cases with union requirements as snapshots from my APT DependencyProvider implementation once this PR is merged. I also think it will be better to include these benchmarks as part of the suite of checks run on every commit to keep track of performance improvements and regressions, if you are open to it. I'll wait for you to review the C++ binding API considering this PR ready to merge, and tackle the benchmarking in a subsequent PR.

@eviltak eviltak requested a review from baszalmstra August 5, 2024 15:25
@baszalmstra
Copy link
Contributor

Thanks @eviltak, also for the C++ bindings! This looks good to me!

@baszalmstra baszalmstra merged commit 31cf851 into mamba-org:main Aug 5, 2024
13 checks passed
@baszalmstra baszalmstra mentioned this pull request Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants