-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
propagate markers to sibling dependencies of forks #5163
Conversation
@@ -42,6 +43,9 @@ pub enum PubGrubPackageInner { | |||
/// A Python version. | |||
Python(PubGrubPython), | |||
/// A Python package. | |||
/// | |||
/// Note that it is guaranteed that `extra` and `dev` are never both | |||
/// `Some`. That is, if one is `Some` then the other must be `None`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if this is actually true.
// either. | ||
if dev.is_some() { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure of what to do in this case. I haven't given it too much thought yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could happen if you fork at the root and that determines whether to use a workspace package which has dev deps
{ name = "typing-extensions", marker = "python_full_version <= '3.10' or python_full_version > '3.10'" }, | ||
{ name = "attrs", marker = "(python_full_version <= '3.10' and python_version == '3.10') or (python_full_version <= '3.10' and python_version < '3.10') or (python_full_version <= '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version == '3.10') or (python_full_version > '3.10' and python_version < '3.10') or (python_full_version > '3.10' and python_version > '3.10')" }, | ||
{ name = "iniconfig", marker = "(python_full_version <= '3.10' and python_version == '3.10') or (python_full_version <= '3.10' and python_version < '3.10') or (python_full_version <= '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version <= '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version == '3.10') or (python_full_version > '3.10' and python_version < '3.10') or (python_full_version > '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version > '3.10')" }, | ||
{ name = "typing-extensions", marker = "(python_full_version <= '3.10' and python_version == '3.10') or (python_full_version <= '3.10' and python_version < '3.10') or (python_full_version <= '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version <= '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version == '3.10') or (python_full_version > '3.10' and python_version < '3.10') or (python_full_version > '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version > '3.10')" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brutal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it time for some resolution
crates/uv/tests/pip_compile.rs
Outdated
@@ -6980,7 +6980,7 @@ fn universal_no_repeated_unconditional_distributions() -> Result<()> { | |||
# via sphinx | |||
tomli==2.0.1 ; python_version < '3.11' | |||
# via pylint | |||
tomlkit==0.12.4 | |||
tomlkit==0.12.4 ; python_version < '3.11' or python_version >= '3.12' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if this new snapshot is correct. The marker expressions look awfully suspicious here.
crates/uv/tests/lock_scenarios.rs
Outdated
@@ -1919,7 +1919,7 @@ fn fork_marker_selection() -> Result<()> { | |||
version = "0.1.0" | |||
source = { editable = "." } | |||
dependencies = [ | |||
{ name = "package-a" }, | |||
{ name = "package-a", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the status quo is correct, because this update leaves a "hole" (e.g., on Windows) where previously a
would be installed but now it isn't.
@@ -28,6 +29,10 @@ pub(crate) struct PubGrubDependency { | |||
} | |||
|
|||
impl PubGrubDependency { | |||
pub(crate) fn and_markers(&mut self, marker: &MarkerTree) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make those Fn(Self, T) -> Self
instead? iirc we don't pass &mut PubGrubDependency
around, so only the owner can mutate them anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but MarkerTree::and
takes &mut Self
.
I think either way can work where we avoid cloning MarkerTree
. This way, it means using Arc::make_mut
. If we took ownership, it would be Arc::unwrap_or_clone
. But, in the make_mut
case where there is only one reference, we don't need to re-create the Arc
.
How much this matters, I dunno. But I opted for the option that I thought was less costly. (And there are assuredly designs that are even less costly than what I have, but I think it would be a bigger refactor.)
// information from this package as we can to avoid allocs. It | ||
// is possible that the `Arc::make_mut` will do a deep clone | ||
// here, thereby defeating our efforts, but I think it is | ||
// likely that in most cases it will not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be doing those changes directly after instance creation and before passing it pubgrub (which clones the package), so this should hold true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you. I added that to this comment.
// either. | ||
if dev.is_some() { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could happen if you fork at the root and that determines whether to use a workspace package which has dev deps
{ name = "typing-extensions", marker = "python_full_version <= '3.10' or python_full_version > '3.10'" }, | ||
{ name = "attrs", marker = "(python_full_version <= '3.10' and python_version == '3.10') or (python_full_version <= '3.10' and python_version < '3.10') or (python_full_version <= '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version == '3.10') or (python_full_version > '3.10' and python_version < '3.10') or (python_full_version > '3.10' and python_version > '3.10')" }, | ||
{ name = "iniconfig", marker = "(python_full_version <= '3.10' and python_version == '3.10') or (python_full_version <= '3.10' and python_version < '3.10') or (python_full_version <= '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version <= '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version == '3.10') or (python_full_version > '3.10' and python_version < '3.10') or (python_full_version > '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version > '3.10')" }, | ||
{ name = "typing-extensions", marker = "(python_full_version <= '3.10' and python_version == '3.10') or (python_full_version <= '3.10' and python_version < '3.10') or (python_full_version <= '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version <= '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version == '3.10') or (python_full_version > '3.10' and python_version < '3.10') or (python_full_version > '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version > '3.10')" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it time for some resolution
22d0935
to
a2a8dff
Compare
"### | ||
); | ||
|
||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a regression test for #5086.
a2a8dff
to
626d29b
Compare
626d29b
to
403fc11
Compare
dependencies: vec![], | ||
markers, | ||
}); | ||
let mut new_forks_for_remaining_universe = forks.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a single fork that we build by using the original deps and removing all that are disjoint with this fork's markers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. That was how I had originally done it (except without adding original deps). Crucially, this new "remaining universe" partitioning is added in the same way as other partitionings are. That is, it needs to be considered in the context of any extant forks. Basically, when a fork is created, it starts by copying all extant forks at the time the partitioning was determined, adding dependencies corresponding to the fork along with the markers.
Let me see if I can come up with an example using packse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I came up with an example here: astral-sh/packse#204
For that scenario, I believe your suggestion would lead to the "remaining universe" fork created by b
in the dependencies of a
having a marker expression of just os_name != 'linux' and os_name != 'darwin'
. But it actually should be combined with the marker expressions of every parent fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario doesn't seem to use that code path, it looks like with it we're only doing one iteration with the parent universal fork:
if let Some(markers) = fork_groups.remaining_universe() {
trace!("Adding split to cover possibly incomplete markers: {markers}");
println!(
"forks: {} {}",
forks.len(),
forks
.iter()
.map(|fork| format!("({})", fork.markers))
.join(" || ")
);
let mut new_forks_for_remaining_universe = forks.clone();
for fork in &mut new_forks_for_remaining_universe {
fork.markers.and(markers.clone());
fork.dependencies.retain(|dep| {
let Some(dep_marker) = dep.package.marker() else {
return true;
};
// After we constrain the markers on an existing
// fork, we should ensure that any existing
// dependencies that are no longer possible in this
// fork are removed. This mirrors the check we do in
// `add_nonfork_package`.
!crate::marker::is_disjoint(&fork.markers, dep_marker)
});
}
println!(
"new_forks_for_remaining_universe: {} {}",
new_forks_for_remaining_universe.len(),
new_forks_for_remaining_universe
.iter()
.map(|fork| format!("({})", fork.markers))
.join(" || ")
);
new_forks.extend(new_forks_for_remaining_universe);
}
$ rm -f uv.lock && cargo run lock -v 2> a.txt
forks: 1 ()
new_forks_for_remaining_universe: 1 (sys_platform != 'illumos' and sys_platform != 'windows')
forks: 1 ()
new_forks_for_remaining_universe: 1 (os_name != 'darwin' and os_name != 'linux')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will merge this as-is and follow-up on it later, overlapping markers are more important to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're 100% correct. It doesn't! I even had my own debug prints and was not careful in looking at what they were telling me. I spent some time trying to up with a better example, but failed. I filed #5482 to track investigation into this possible issue.
The description for the scenario attempts to explain the utility of the test. This was added in response to a question here: astral-sh/uv#5163 (comment)
The description for the scenario attempts to explain the utility of the test. This was added in response to a question here: astral-sh/uv#5163 (comment)
Basically, and'ing or or'ing the same expression can be entire skipped. And we try harder to avoid singleton conjunctions or disjunctions, as these are considered unequal otherwise. (Thus defeating our attempts to avoid and'ing or or'ing a superfluous marker.)
Interestingly, the empty string appears to be valid for these types. I'm not sure if that's intended, but having a Default impl is useful for use with `std::mem::take`.
The snapshot saved here is wrong, but we'll update it in a subsequent commit.
When a fork occurs, we divide not just the dependencies that provoked a fork into distinct groups, but we also add the corresponding sibling dependencies to each fork. Previously, while we track markers on the fork itself, the individual dependencies that had markers only corresponded to markers written from the dependency specification. This meant that the sibling dependencies that got added to each fork would not themselves have markers attached to them. This in turn meant they would not have markers associated with them in the lock file. In many cases, this is actually okay, because the resolver will pick a version that is "universal" across all forks in most cases. But in some cases, this just simply isn't possible as the marker expressions in the fork can and do influence resolution. In which case, it is possible for the same package with different versions to show up in the lock file unconditionally. Which is a big no-no. So in this commit, after we determine the forks, we intersect the markers on each fork with each of its dependencies. This does seem to balloon the marker expressions in some cases. I plucked one low hanging fruit to avoid doing `x and x` in trivial cases. (And this eliminated a portion of the snapshot diffs.) But some pretty gnarly diffs remain. This commit also fixes another bug: previously, when we created a fork to capture the "remaining" universe of an incomplete set of markers, we left out dependencies that should be included in that fork. We rectify that here. Fixes #5086 Partially addresses #4732
403fc11
to
c7fe419
Compare
## Summary This PR rewrites the `MarkerTree` type to use algebraic decision diagrams (ADD). This has many benefits: - The diagram is canonical for a given marker function. It is impossible to create two functionally equivalent marker trees that don't refer to the same underlying ADD. This also means that any trivially true or unsatisfiable markers are represented by the same constants. - The diagram can handle complex operations (conjunction/disjunction) in polynomial time, as well as constant-time negation. - The diagram can be converted to a simplified DNF form for user-facing output. The new representation gives us a lot more confidence in our marker operations and simplification, which is proving to be very important (see #5733 and #5163). Unfortunately, it is not easy to split this PR into multiple commits because it is a large rewrite of the `marker` module. I'd suggest reading through the `marker/algebra.rs`, `marker/simplify.rs`, and `marker/tree.rs` files for the new implementation, as well as the updated snapshots to verify how the new simplification rules work in practice. However, a few other things were changed: - [We now use release-only comparisons for `python_full_version`, where we previously only did for `python_version`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/algebra.rs#L522). I'm unsure how marker operations should work in the presence of pre-release versions if we decide that this is incorrect. - [Meaningless marker expressions are now ignored](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/parse.rs#L502). This means that a marker such as `'x' == 'x'` will always evaluate to `true` (as if the expression did not exist), whereas we previously treated this as always `false`. It's negation however, remains `false`. - [Unsatisfiable markers are written as `python_version < '0'`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/tree.rs#L1329). - The `PubGrubSpecifier` type has been moved to the new `uv-pubgrub` crate, shared by `pep508-rs` and `uv-resolver`. `pep508-rs` also depends on the `pubgrub` crate for the `Range` type, we probably want to move `pubgrub::Range` into a separate crate to break this, but I don't think that should block this PR (cc @konstin). There is still some remaining work here that I decided to leave for now for the sake of unblocking some of the related work on the resolver. - We still use `Option<MarkerTree>` throughout uv, which is unnecessary now that `MarkerTree::TRUE` is canonical. - The `MarkerTree` type is now interned globally and can potentially implement `Copy`. However, it's unclear if we want to add more information to marker trees that would make it `!Copy`. For example, we may wish to attach extra and requires-python environment information to avoid simplifying after construction. - We don't currently combine `python_full_version` and `python_version` markers. - I also have not spent too much time investigating performance and there is probably some low-hanging fruit. Many of the test cases I did run actually saw large performance improvements due to the markers being simplified internally, reducing the stress on the old `normalize` routine, especially for the extremely large markers seen in `transformers` and other projects. Resolves #5660, #5179.
This PR rewrites the `MarkerTree` type to use algebraic decision diagrams (ADD). This has many benefits: - The diagram is canonical for a given marker function. It is impossible to create two functionally equivalent marker trees that don't refer to the same underlying ADD. This also means that any trivially true or unsatisfiable markers are represented by the same constants. - The diagram can handle complex operations (conjunction/disjunction) in polynomial time, as well as constant-time negation. - The diagram can be converted to a simplified DNF form for user-facing output. The new representation gives us a lot more confidence in our marker operations and simplification, which is proving to be very important (see #5733 and #5163). Unfortunately, it is not easy to split this PR into multiple commits because it is a large rewrite of the `marker` module. I'd suggest reading through the `marker/algebra.rs`, `marker/simplify.rs`, and `marker/tree.rs` files for the new implementation, as well as the updated snapshots to verify how the new simplification rules work in practice. However, a few other things were changed: - [We now use release-only comparisons for `python_full_version`, where we previously only did for `python_version`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/algebra.rs#L522). I'm unsure how marker operations should work in the presence of pre-release versions if we decide that this is incorrect. - [Meaningless marker expressions are now ignored](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/parse.rs#L502). This means that a marker such as `'x' == 'x'` will always evaluate to `true` (as if the expression did not exist), whereas we previously treated this as always `false`. It's negation however, remains `false`. - [Unsatisfiable markers are written as `python_version < '0'`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/tree.rs#L1329). - The `PubGrubSpecifier` type has been moved to the new `uv-pubgrub` crate, shared by `pep508-rs` and `uv-resolver`. `pep508-rs` also depends on the `pubgrub` crate for the `Range` type, we probably want to move `pubgrub::Range` into a separate crate to break this, but I don't think that should block this PR (cc @konstin). There is still some remaining work here that I decided to leave for now for the sake of unblocking some of the related work on the resolver. - We still use `Option<MarkerTree>` throughout uv, which is unnecessary now that `MarkerTree::TRUE` is canonical. - The `MarkerTree` type is now interned globally and can potentially implement `Copy`. However, it's unclear if we want to add more information to marker trees that would make it `!Copy`. For example, we may wish to attach extra and requires-python environment information to avoid simplifying after construction. - We don't currently combine `python_full_version` and `python_version` markers. - I also have not spent too much time investigating performance and there is probably some low-hanging fruit. Many of the test cases I did run actually saw large performance improvements due to the markers being simplified internally, reducing the stress on the old `normalize` routine, especially for the extremely large markers seen in `transformers` and other projects. Resolves #5660, #5179.
When a fork occurs, we divide not just the dependencies that
provoked a fork into distinct groups, but we also add the
corresponding sibling dependencies to each fork. Previously,
while we track markers on the fork itself, the individual
dependencies that had markers only corresponded to markers
written from the dependency specification.
This meant that the sibling dependencies that got added to
each fork would not themselves have markers attached to them.
This in turn meant they would not have markers associated with
them in the lock file.
In many cases, this is actually okay, because the resolver will
pick a version that is "universal" across all forks in most
cases. But in some cases, this just simply isn't possible as
the marker expressions in the fork can and do influence resolution.
In which case, it is possible for the same package with different
versions to show up in the lock file unconditionally. Which is a
big no-no.
So in this commit, after we determine the forks, we intersect the
markers on each fork with each of its dependencies.
This does seem to balloon the marker expressions in some cases.
I plucked one low hanging fruit to avoid doing
x and x
intrivial cases. (And this eliminated a portion of the snapshot
diffs.) But some pretty gnarly diffs remain. So we will likely
want to invest more in marker simplification.
Fixes #5086, Fixes #5162
Partially addresses #4732