Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip 20.3+ and its new dependency resolver #1109

Closed
edmorley opened this issue Nov 9, 2020 · 27 comments · Fixed by #1259
Closed

Pip 20.3+ and its new dependency resolver #1109

edmorley opened this issue Nov 9, 2020 · 27 comments · Fixed by #1259
Assignees

Comments

@edmorley
Copy link
Member

edmorley commented Nov 9, 2020

The soon to be released pip 20.3 comes with a new dependency resolver that's enabled by default:
https://pip.pypa.io/en/latest/user_guide/#changes-to-the-pip-dependency-resolver-in-20-3-2020

This dependency resolver is a great improvement for pip, however it is by design more strict so in some cases will cause pip install failures in apps whose builds previously succeeded. Some of these failures will be fixed by upstream package maintainers fixing their package's declared dependencies - others may require changes by end users of apps using this buildpack.

As such the Python buildpack will not be enabling the new resolver straight away (in order to give the community time to fix upstream broken packages and/or discover any last bugs in the resolver itself), and when it is enabled, we'll have documentation for how to switch back to the legacy resolver (using requirements.txt option flags), a changelog entry and likely error message handling in the buildpack itself directing users to the docs if a resolver related error is encountered.

Note: The Python buildpack pins the version of pip it uses, so users of this buildpack won't see any change in behaviour on pip 20.3 release day, only when we explicitly upgrade.

@edmorley
Copy link
Member Author

edmorley commented Nov 30, 2020

Pip 20.3 has just been released:
https://pythoninsider.blogspot.com/2020/11/pip-20-3-release-new-resolver.html
https://discuss.python.org/t/announcement-pip-20-3-release/5948

As explained above, we won't be upgrading to it straight away, so apps using this buildpack will be unaffected for now.

I'd imagine we'll try updating some time at the start of next year.

@edmorley edmorley pinned this issue Nov 30, 2020
@adamchainz
Copy link

pip 21 has now been released, which drops Python <3.5 support. To install the latest pip supported by the current Python there's this dummy package: https://github.com/graingert/pip-with-requires-python .

@edmorley
Copy link
Member Author

edmorley commented Feb 8, 2021

@adamchainz Hi!

That package is only needed for scenarios where someone is both using an environment with a very old pip version that does not support the new "requires Python version X" metadata (for example an older Linux distro's system pip) and also wants to pip install -U pip without specifying a version. Neither of those are the case for the Python buildpack, which doesn't rely on system pip, and also always (by design) specified a pip version rather than pulling in the latest version.

In terms of updating the Python buildpack's pip to a newer version, it's on the list for later this year, however:

  1. It will cause breakage for users whose dependencies conflict (and will be annoyed that "something that worked" now breaks; which will require careful handholding, grepping common error patterns and printing error messages that explain how they can opt out)
  2. There has not yet been any demand for newer pip to fix cases where the new resolver or newer pip features are required. There will no doubt such cases in the future, but there have been no requests so far.
  3. There are still papercuts being fixed in pip related to the new resolver (I read through Pip's issue tracker and open PRs at least once a week)
  4. The longer we leave it, the more of the packaging ecosystem will fix instances of broken/conflicting packages, reducing (1)

That said, I'm looking forward to being on newer Pip as soon as viable :-)

@graingert
Copy link

That package is only needed for scenarios where someone is both using an environment with a very old pip version that does not support the new "requires Python version X" metadata (for example an older Linux distro's system pip) and also wants to pip install -U pip without specifying a version.

yes, but also it has the serendipitous effect of installing "the latest pip supported by the current Python"

@edmorley
Copy link
Member Author

edmorley commented Feb 8, 2021

This does not help us, since for performance reasons the Python buildpack intentionally installs the chosen pip version from the outset, rather than installing a bootstrap version and then upgrading it.

@brianhelba
Copy link

brianhelba commented Feb 12, 2021

@edmorley I'll present a case where the old resolver fails (and this affects multiple Heroku-hosted projects of mine). In the requirements structure:

  • foo
    • bar[baz]
      • baz
  • bar

the list of installed packages is just:

  • foo
  • bar

In other words, in the old resolver, if a transitive requirement (bar[baz]) is superficially satisfied by a top-level requirement (bar), than extras on the transitive requirement (baz) are ignored.

The workaround is just to repeat any extras on the top-level requirement, despite the fact that they may not actually be required there.


For what it's worth, I am requesting the option to enable newer versions of pip (with the new resolver) on an opt-in basis, though I certainly appreciate the reasons why you're hesitant to make this the default.

@edmorley
Copy link
Member Author

edmorley commented Feb 12, 2021

@brianhelba Thank you for the example :-)

Would an update to pip 20.2.4 (the version immediately before the new resolver became the default) in the meantime be of any use here? I believe the new resolver could then be enabled by adding --use-feature=2020-resolver to requirements.txt (this flag was only added in pip 20.2 so not in the 20.1.1 currently used by the buildpack).

The only issue I can see is that 20.2.4 does not include a number of more recent fixes to the new resolver so may not work well enough to be usable, depending on use-case.

@brianhelba
Copy link

Would an update to pip 20.2.4 (the version immediately before the new resolver became the default) in the meantime be of any use here?

I'd be happy to try it out.

The only issue I can see is that 20.2.4 does not include a number of more recent fixes to the new resolver so may not work well enough to be usable, depending on use-case.

I only started using the new resolver once it was released in 20.3. I haven't locally tested 20.2.4, but I could if you want to know in advance of making buildpack changes.

@edmorley
Copy link
Member Author

Hmm I'm leaning towards updating to 20.2.4 as an interim step regardless, if only to reduce the number of potentially breaking changes unrelated to the new resolver, when we finally update even further.

@edmorley
Copy link
Member Author

edmorley commented Apr 8, 2021

Hmm I'm leaning towards updating to 20.2.4 as an interim step regardless

Opened #1192 for this.

edmorley added a commit that referenced this issue Apr 8, 2021
Updates pip from 20.1.1 to 20.2.4 for Python 2.7 and Python 3.5+.

Python 3.4 continues to use 19.1.1, the last pip version to support it.

We're not updating to pip 20.3+ yet, since those versions enable the new
pip dependency resolver by default, which has compatibility implications,
so needs additional UX work first as well as many upstream fixes as
possible (see #1109).

Changes:
https://pip.pypa.io/en/latest/news/#v20-2-4
pypa/pip@20.1.1...20.2.4

Closes GUS-W-7944197.
edmorley added a commit that referenced this issue Apr 8, 2021
Updates pip from 20.1.1 to 20.2.4 for Python 2.7 and Python 3.5+.

Python 3.4 continues to use 19.1.1, the last pip version to support it.

We're not updating to pip 20.3+ yet, since those versions enable the new
pip dependency resolver by default, which has compatibility implications,
so needs additional UX work first as well as many upstream fixes as
possible (see #1109).

Changes:
https://pip.pypa.io/en/latest/news/#v20-2-4
pypa/pip@20.1.1...20.2.4

Closes GUS-W-7944197.
@uranusjr
Copy link

Hi, I’m a maintainer of pip and worked on the dependency resolver. I see the step still installs 20.2.4 by default, is there something we can do to help upgrading?

@edmorley
Copy link
Member Author

edmorley commented Apr 19, 2021

@uranusjr Hi! Thank you for reaching out (and all your hard work on pip!).

I'm very keen to update to a post 20.3 release once the rate of new resolver bugs/issues has settled. I've been checking the pip repo's issues and PR activity a few times a week, and it seems that has mostly occurred by now. There are a few new resolver fixes in 21.1, so I was thinking of updating to 21.1 once it's released.

That said, pypa/pip/issues/9187 is still concerning as it means some users will have their builds hit the Heroku build timeout. Unless I add a wrapper around the pip invocation (that ends the pip install before the overall build timeout; which is problematic since the overall timeout is dynamic), I'll have no way to identify these cases and output messaging to users as to how to resolve. Which means I either don't output anything (and leave users to google or check the Heroku knowledge base articles after failures), or have to output a "using new pip resolver, visit <URL> if your build times out" warning before the pip install step for all users, even those whose runs don't fail - which doesn't seem ideal.

I'll also need to catalogue the various non-timeout pip new resolver failure modes (specifically the error message strings that can occur), so the Python buildpack can check for them during the "analyse pip errors" step, and output useful suggestions. For example linking to the pip docs on resolving dependency conflicts, and also saying that users can temporarily disable the new resolver by adding --use-deprecated=legacy-resolver to their requirements.txt.

@uranusjr
Copy link

Thanks for the feedback. The undetermined resolution time is definitely a big issue, and is really an issue for every dependency resolver. It is however very difficult to set a reasonable cap to resolution resources since it both is highly subjective and varies across scenarios—the same problems you have with a timeout applies to everything else as well. On the other hand though, if the new resolver takes a lot of time, the legacy resolver would just populate an environment that’s going to fail at runtime in most (if not all) scenarios. So using the legacy resolver is more or less punting the issue back to the user (which is of course a perfectly fine solution for a service).

Pip intends to shift this complexity back to the user in the long run as well, since the user is really the only person that knows how long is too long for a resolution to take. This would bring Python’s packaging experience closer to ecosystems like e.g. Rust, where all versions are resolved ahead of time in a “lock file”, and the service would simply install packages listed in it. So instead of running pip install on an arbitrary requirements.txt, a service would only accept a requirements.txt with all packages “pinned” with ==, and run a pip install --no-deps against it (which is guaranteed to finish in linear time).

But that’s probably not a useful thing for Heroku until the Python packaging ecosystem can produce a smooth-ist transition to the new workflow (and the pip version is no longer relevant after that; pip install --no-deps works pretty much for every pip version released in the last 5+ years), so I guess I’m only trying to say in too many words I don’t have a good solution for you right now.

@edmorley
Copy link
Member Author

edmorley commented Jul 8, 2021

I've seen at least one package on PyPI that has recently switched to only shipping PEP660 style wheels (eg manylinux_2_24), with no sdist or older wheel formats provided. These require the PEP660 support added in packaging 20.5, and thus requires pip 20.3+ (pypa/pip/issues/9077), otherwise installation fails with a ERROR: No matching distribution found etc.

I guess time to now update? 😆

I'll also need to find a set of example failing cases with the new resolver, so I can add handling for them in this buildpack (eg looking for specific failure modes and suggesting that the user try --use-deprecated=legacy-resolver). I'll likely end up searching historic Pip issues and/or pip's test suite, however if anyone has some good examples/links prior to then, that would be helpful :-)

@uranusjr
Copy link

uranusjr commented Jul 9, 2021

For a dependency error (that would previously succeed in the legacy resolver), the new resolver would emit a message containing the string ResolutionImpossible. So whenever you find that string it’s time to suggest falling back to the legacy resolver or downgrading pip.

List of issues on the new resolver: https://github.com/pypa/pip/issues?q=is%3Aopen+is%3Aissue+label%3A%22C%3A+dependency+resolution%22

Currently there are mainly three categories the new resolver does not do well (not including ResolutionImpossible which the new resolver is explicitly designed to fail on):

  • Upgrading complex dependencies (with extras, from an explicit URL, etc.). This does not apply here since Heroku always bootstraps the environment from scratch.
  • Inaccurate diagnosis when ResolutionImpossible is triggered on complex dependencies.
  • Explicitly ignoring a known resolution error. This also emits a ResolutionImpossible, but instead of a dependency graph error, it is caused by a mis-packaged dependency. This is something the user can’t fix and need to rely on maintainers of that dependency.

The latter two are both ResolutionImpossible variants, and there’s really not much Heroku can do for both, so they should probably be handled the same as “normal” ResolutionImpossible, suggesting the user to use the legacy resolver or ask for help on general Python forums.

@edmorley
Copy link
Member Author

I'm still actively following pip releases and known issues for the new resolver. I was very much hoping that we could update to the 21.2.x releases, however there appear to be further regressions, for example:
pypa/pip#10201

I would love to see pypa/pip#10415 fixed in a release soon (there is a WIP PR), given it sounds like it would help with that regression.

@uranusjr
Copy link

uranusjr commented Sep 14, 2021

Please note that the author of both pypa/pip#10415 and the associated draft PR explicitly (and very correctly) states the fix is not likely to help many cases affected in pypa/pip#10201.

@edmorley
Copy link
Member Author

Ah I was thinking of pypa/pip#10201 (comment) which mentions it includes a fix for that (in addition to the new ordering approach) - sounds like the impact was minimal compared to the new ordering approach itself.

Anyway, I mainly wanted to update this issue to say I hadn't forgotten about it, but am waiting for things like pypa/pip#10201 to be fixed.

@notatallshaw
Copy link

notatallshaw commented Sep 16, 2021

Just saw this issue, is there a particular set of problematic requirements or is this a general worry?

I've put together a summary of what I think will be a big optimization for most real world dependency trees: pypa/pip#10479

You can test it with this tag if you have specific examples: python -m pip install git+git://github.com/notatallshaw/pip@third_attempt_at_prefer_non_conflicts

Though I would appreciate if you could share those specific examples so I can test myself.

@edmorley
Copy link
Member Author

edmorley commented Sep 16, 2021

@notatallshaw Hi! I don't have specific examples yet (I won't until we upgrade and people file support tickets), it's more that:

  • at Heroku scale, something that affects a small percentage of users still means many many users impacted (and thus potentially hundreds of support tickets)
  • part of the premise of Heroku is that things "just work", and users understandably get very unhappy if things break when they didn't change anything in their app
  • some of the failure modes relating to the new resolver (eg hours of backtracking with no hint of how to resolve) are quite painful from a UX point of view, and not something I'm able to easily work around in the buildpack (more at Pip 20.3+ and its new dependency resolver #1109 (comment))

Thank you for working on pypa/pip#10479 -- it looks like a promising improvement that may fix pypa/pip#10201 and friends enough that we can finally update here :-)

@notatallshaw
Copy link

notatallshaw commented Sep 16, 2021

  • part of the premise of Heroku is that things "just work", and users understandably get very unhappy if things break when they didn't change anything in their app

It's interesting though because Pip, 20.2 very much is able to create broken environments that may not work at all. So "just works" is a tricky one, the longer I've been looking at this issue the more I'm surprised pip ever worked in the first place with any moderately complex dependencies.

  • some of the failure modes relating to the new resolver (eg hours of backtracking with no hint of how to resolve) are quite painful from a UX point of view, and not something I'm able to easily work around in the buildpack (more at Pip 20.3 and its new dependency resolver #1109 (comment))

I also agree, which is why I've been thinking about options like this: pypa/pip#10417

@edmorley
Copy link
Member Author

It's interesting though because Pip, 20.2 very much is able to create broken environments that may not work at all. So "just works" is a tricky one

I agree the old resolver can lead to broken environments. The issue is that for those Heroku app environments that were actually non-functioning, users will have self-corrected their requirements so that their application works months/years ago. Those that remain are those whose pip check might exit non-zero, but to the user "the app works".

@edmorley edmorley changed the title Pip 20.3 and its new dependency resolver Pip 20.3+ and its new dependency resolver Oct 12, 2021
@edmorley
Copy link
Member Author

Pip 21.3 was released yesterday:
https://pip.pypa.io/en/stable/news/#v21-3

Of note...

When backtracking during dependency resolution, prefer the dependencies which are involved in the most recent conflict. This can significantly reduce the amount of backtracking required. (pypa/pip/pull/10481)

This will help with a significant proportion of the cases that would have exceeded the build timeout.

Add link to the appropriate documentation in pip backtracking output (pypa/pip/pull/10516)

This will help slightly with the UX issue - that is, users of this buildpack not knowing why package installation is seemingly taking a long time or timing out. We may still need some additional messaging in the buildpack build log output, but this means we won't have to explain quite as much - and does a better job than we could, since it's only shown when needed (whereas for our messaging we would have to show it unconditionally given we can't just show it at the end due to dynamic build timeouts).

@edmorley
Copy link
Member Author

@edmorley edmorley self-assigned this Oct 29, 2021
edmorley added a commit that referenced this issue Oct 29, 2021
Update pip from 20.2.4 to:
  - 20.3.4 for Python 2.7 and 3.5
  - 21.3.1 for Python 3.6+

Of note Pip 20.3+ includes the new dependency resolver (only enabled by default
when using Python 3+). This new dependency resolver is more strict, see:
https://pip.pypa.io/en/stable/user_guide/#changes-to-the-pip-dependency-resolver-in-20-3-2020
https://pip.pypa.io/en/stable/topics/dependency-resolution/

Release notes:
https://pip.pypa.io/en/stable/news/#v21-3-1

Changelog:
pypa/pip@20.2.4...21.3.1

The new versions of pip have been synced to S3 using:

```
$ pip download --no-cache pip==20.3.4
...
Saved ./pip-20.3.4-py2.py3-none-any.whl
Successfully downloaded pip
$ pip download --no-cache pip==21.3.1
Collecting pip==21.3.1
...
Saved ./pip-21.3.1-py3-none-any.whl
Successfully downloaded pip
$ aws s3 sync . s3://heroku-buildpack-python/common/ --exclude "*" --include "*.whl" --dryrun
(dryrun) upload: ./pip-20.3.4-py2.py3-none-any.whl to s3://heroku-buildpack-python/common/pip-20.3.4-py2.py3-none-any.whl
(dryrun) upload: ./pip-21.3.1-py3-none-any.whl to s3://heroku-buildpack-python/common/pip-21.3.1-py3-none-any.whl
$ aws s3 sync . s3://heroku-buildpack-python/common/ --exclude "*" --include "*.whl"
upload: ./pip-20.3.4-py2.py3-none-any.whl to s3://heroku-buildpack-python/common/pip-20.3.4-py2.py3-none-any.whl
upload: ./pip-21.3.1-py3-none-any.whl to s3://heroku-buildpack-python/common/pip-21.3.1-py3-none-any.whl
```

Closes #1109.
GUS-W-8493316.
edmorley added a commit that referenced this issue Nov 1, 2021
Update pip from 20.2.4 to:
  - 20.3.4 for Python 2.7 and 3.5
  - 21.3.1 for Python 3.6+

Of note Pip 20.3+ includes the new dependency resolver (only enabled by default
when using Python 3+). This new dependency resolver is more strict, see:
https://pip.pypa.io/en/stable/user_guide/#changes-to-the-pip-dependency-resolver-in-20-3-2020
https://pip.pypa.io/en/stable/topics/dependency-resolution/

Release notes:
https://pip.pypa.io/en/stable/news/#v21-3-1

Changelog:
pypa/pip@20.2.4...21.3.1

The new versions of pip have been synced to S3 using:

```
$ pip download --no-cache pip==20.3.4
...
Saved ./pip-20.3.4-py2.py3-none-any.whl
Successfully downloaded pip
$ pip download --no-cache pip==21.3.1
Collecting pip==21.3.1
...
Saved ./pip-21.3.1-py3-none-any.whl
Successfully downloaded pip
$ aws s3 sync . s3://heroku-buildpack-python/common/ --exclude "*" --include "*.whl" --dryrun
(dryrun) upload: ./pip-20.3.4-py2.py3-none-any.whl to s3://heroku-buildpack-python/common/pip-20.3.4-py2.py3-none-any.whl
(dryrun) upload: ./pip-21.3.1-py3-none-any.whl to s3://heroku-buildpack-python/common/pip-21.3.1-py3-none-any.whl
$ aws s3 sync . s3://heroku-buildpack-python/common/ --exclude "*" --include "*.whl"
upload: ./pip-20.3.4-py2.py3-none-any.whl to s3://heroku-buildpack-python/common/pip-20.3.4-py2.py3-none-any.whl
upload: ./pip-21.3.1-py3-none-any.whl to s3://heroku-buildpack-python/common/pip-21.3.1-py3-none-any.whl
```

Closes #1109.
GUS-W-8493316.
@edmorley
Copy link
Member Author

edmorley commented Nov 1, 2021

temporarily disable the new resolver by adding --use-deprecated=legacy-resolver to their requirements.txt.

So I investigated this approach as part of #1259, however unfortunately the set of pip options that can be configured via requirements.txt doesn't include --use-deprecated, even though it did include --use-feature (which was the flag prior to it becoming the default; and what I'd tested using before):
https://pip.pypa.io/en/stable/reference/requirements-file-format/#global-options

As such, the only way to provide a user-controllable toggle would be to do so via custom buildpack environment variable or some non-standard requirements.txt comment syntax to signal to the buildpack that it should manually pass --use-deprecated=legacy-resolver to the pip install command. Unlike the native pip feature, the buildpack would then be responsible for handling this, including when the feature is removed warning that it's redundant. Doable, but a lot more temporary custom code to work around dependencies that really need to be fixed.

In addition, it's now been a year since upstream pip has been using the new resolver, meaning that most of the commonly encountered dependency issues in the wild have been fixed.

Given all of that, for now I've decided not to add an opt-out, especially since apps/dependencies will eventually need to adapt regardless (pip will be removing support for the new resolver in a future release), and that those that need a workaround in the meantime can pin to the previous buildpack release (v201):
https://github.com/heroku/heroku-buildpack-python/blob/main/CHANGELOG.md#v201-2021-10-20
https://devcenter.heroku.com/articles/buildpacks#buildpack-references

In the changelog entry for this change, I've linked to the pip docs on resolving ResolutionImpossible errors and reducing backtracking, which are strongly recommended over pinning buildpack version longer term:
https://devcenter.heroku.com/changelog-items/2288

@uranusjr
Copy link

uranusjr commented Nov 5, 2021

It's possible to add --use-deprecated support to requirements.txt, I think it's just nobody felt the need until now. Feel free to open an issue in pypa/pip for this.

@edmorley
Copy link
Member Author

edmorley commented Nov 8, 2021

@uranusjr Yeah I'd contemplated filing an issue/opening a PR, however it would presumably now have to wait until the v22 pip release in 3-4 months, by which time the need for this will have reduced on our side.

@edmorley edmorley unpinned this issue Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants