-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slowness on Python 3.11 when updating an existing large environment. #12080
Conversation
@@ -376,7 +377,9 @@ def name(self) -> str: | |||
|
|||
@property | |||
def version(self) -> CandidateVersion: | |||
return self.dist.version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why the self.dist.version
line is slow? And how is it related to Python 3.11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reiterate for clarity: it is not Python 3.11 per se, but it is because 3.11 uses importlib.metadata
by default, instead of pkg_resources
. You see the same on Python 3.10 with _PIP_USE_IMPORTLIB_METADATA=1
, and 3.11 is fast with _PIP_USE_IMPORTLIB_METADATA=0
.
Getting one version is also not slow, but when doing bin/pip install Plone -c https://dist.plone.org/release/6.0.5/constraints.txt
it gets called 440,106 times, so even milliseconds add up. I see that with the code from this PR, self.dist.version
only gets called one time per package, or 236 times total. For setuptools
alone, self.version
gets called 48,891 times.
The difference is in the class of the distribution, and how it gets the version. It is either pip._internal.metadata.pkg_resources.Distribution
or pip._internal.metadata.importlib._dists.Distribution
.
I get confused following the pkg_resources
case, but for importlib
you get into the standard lib:
> /Users/maurits/.pyenv/versions/3.11.2/lib/python3.11/importlib/metadata/__init__.py(629)version()
-> @property
(Pdb) l
629 -> @property
630 def version(self):
631 """Return the 'Version' metadata for the distribution package."""
632 return self.metadata['Version']
That line 632 actually reads the METADATA
file, making it slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the main difference is probably pkg_resources
caches this value in-memory, while importlib.metadata
needs to re-parse the file every time. The fix makes sense then.
It would be best to modify the news fragment to reflect that this is about importlib.metadata
(and not 3.11 per se), and when pip repeatedly accesses an already-installed distribution (and not all “large environments” are affected, only if that environment has a lot of overlaps with the to-be-installed requirement set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (and rebased on main).
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
@uranusjr Thanks for the review and the guidance! |
Fixes issue #12079 by adding a simple cache on
AlreadyInstalledCandidate.version
.