Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pep508: add requires-python simplification/complexification to MarkerTree #7134

Merged
merged 3 commits into from
Sep 7, 2024

Conversation

BurntSushi
Copy link
Member

In #6268, we refactored how we handled requires-python. Before
that PR, we simplified markers inside resolution in lock-step with
requires-python version narrowing. But this turns out to be quite hard
to do correctly. After that PR, we now generally only simplify markers
when we write the lock file.

So for example, if a package has this marker:

python_full_version >= '3.8' and python_full_version <= '3.10'

And we have requires-python = ">=3.8", then its simplified form is this:

python_full_version <= '3.10'

The problem is that if we use the simplified form in a place where we
shouldn't, perhaps in a place where we don't handle the context of
requires-python correctly or if it changes, then the marker can lead
to incorrect results.

So #6268 fixed this, but there was a hiccup: previously, in order to
actually do the simplification, we did this:

  1. Asked MarkerTree to "restrict" any Python version expressions
    such that they were limited to the intersection of what was in the
    marker and whas was provided by requires-python. 2. When printing
    a MarkerTree, we specifically set the lower and upper bounds to
    negative and positive infinity, respectively.

The end result is that we had a simplified marker, but it relied on a
subtle trick during printing to work. This made it difficult to reason
about markers since they might internally have finite lower/upper
bounds, but when printed, they had non-finite lower/upper bounds.

In order to access the simplified marker, #6268 would run the
"restrict" operation, print the marker as a string and then parse it.
This removed any "hidden" state and made the internal representation of
the marker match its printed form.

But this is bad juju to do, and having the hidden state when we don't
need it any more makes things very difficult to reason about. So in
this PR, we essentially port the old restrict+print approach to one
routine, and then also add a dedicated routine to do the reverse
operation as well (instead of relying on marker conjunction, as we did
before).

I split this change into its own commit because I'm hoping it
crystalizes what it means when we say "a `MarkerTree` has hidden state."
That is, it isn't so much that there is some explicit member of a
`MarkerTree` that is omitted, but rather, the lower and upper version
bounds on `python_full_version` are are rewritten as "unbounded" when
traversing the ADD for display.

We will actually retain this functionality, but rejigger it so that it's
explicit when we do this. In particular, this simplification has been
problematic for us because it fundamentally changes the truth tables of
a marker expression *unless* you are extremely careful to interpret it
only under the original context in which it was simplified. This is
quite difficult to do generally, and in prior work in #6268, we
completed a refactor where we worked around this type of simplification
and moved it to the edges of uv.

In subsequent commits, we'll re-implement this form of simplification as
a more explicit step.
This adds new routines to `MarkerTree` for "simplifying" and
"complexifying" a tree with respect to lower and upper Python version
bounds.

In effect, "simplifying" a marker is what you do when you write it to a
lock file. Namely, since `uv.lock` includes a `requires-python` bound at
the top, one can say that it acts as a bound on the supported Python
versions. That is, it establishes a context in which one can assume that
bound is true. Therefore, the markers we write can be simplified using
this assumption.

The reverse is "complexifying" a marker, and it's what you do when you
read a marker from the lock file. Namely, once a marker is read, it can
be very difficult in code to keep the corresponding requires-python
context from the lock file. If you lose track of it and decide to
operate on the "simplified" marker, then it's trivial for that to
produce an incorrect result.
This finally gets rid of our hack for working around "hidden"
state. We no longer do a roundtrip marker serialization and
deserialization just to avoid the hidden state.
Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally makes sense to me, though may want @ibraheemdev to look too.

@BurntSushi
Copy link
Member Author

I've been chatting loosely with @ibraheemdev about this, so I feel pretty good about this being a generally good approach. (Although I do think we discussed some alternative implementations of complexify.) In any case, I'd be happy to address any feedback in a follow-up PR!

@BurntSushi BurntSushi merged commit f6bc701 into main Sep 7, 2024
58 checks passed
@BurntSushi BurntSushi deleted the ag/remove-hidden-state branch September 7, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants