-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install directly from wheels, without unpacking into an intermediate directory #8562
Install directly from wheels, without unpacking into an intermediate directory #8562
Conversation
Dropping the top-level directory creation allows us to make the processing completely dependent on files to be installed, and not on the top-level directory they happen to be installed in. We already create the parent directory in the loop below, so this call should be redundant for files that get installed.
By removing this dependency of the "file installation" part of `clobber` on the "file finding" part of `clobber`, we can more easily factor out the "file installation" part.
Hiding the file-specific implementation we currently use will let us trade out the implementation for a zip-backed one later. We can also use this interface to represent the other kinds of files that we have to generate as part of wheel installation. We use a Protocol instead of a base class because there's no need for shared behavior right now, and using Protocol is less verbose.
"getting files" is one of the places that requires files to be on disk. By extracting this out of `clobber` we can make it simpler and then trade it out for a zip-based implementation.
1e32645
to
c305100
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement! On one test installing a bunch of wheels from a directory, this saves ~11 seconds out of 53, with another ~11 seconds saved by #8552. That is significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a brilliantly broken up PR that was a breeze to review. Thank you so much for taking the time and making the effort to do so! ^>^
4 minor comments, but the overall PR looks ✨
We always pass a file path to this function, so assert as much. We want the return type to be consistent so we can assign the result to non-Optional types.
This makes `clobber` much simpler, and aligns the interface of root_scheme files and data_scheme files, so we can process them in the same way.
Simplifying the file-finding function will make it easier to drive our whole wheel installation from a single list of files later.
With this approach, we can add the rest of our generated files into the same iterable and they can undergo the same processing.
When we start processing files directly from the wheel, all we will have are the files with their zip path (which should match a `RECORD` entry). Separating this from the source file path (used for copying) and annotating it with our `RecordPath` type makes it clear what the format of this public property is, and that it should match what is in `RECORD`.
From https://docs.python.org/3/library/itertools.html, adapted for Python 2 and with types added. This will be used in the next commit.
At the beginning of our wheel processing we are going to have the list of contained files. By splitting this into its own function, and deriving it from disk in the same way it will appear in the zip, we can incrementally refactor our approach using the same interface that will be available at that time. We start with the root-scheme paths (that end up in lib_dir) first.
Now we rely solely on the list of RECORD-like paths derived from the filesystem, and can easily trade out the implementation for one that comes from the wheel file directly.
One less dependency on the wheel being extracted.
c305100
to
80a2a94
Compare
9b7bb4b
to
d74d6b3
Compare
Just to prevent anyone potentially installing a vulnerable pip when we know there's an easily-fixed issue, I added a commit to address the path traversal issue I mentioned in the original description. |
d74d6b3
to
d13ec25
Compare
If there are no concerns with the addition I'll merge this in another 12 hours. |
Thanks everyone for taking a look!! |
I just checked and it doesn't impact this code, since we aren't transitively using |
def make_data_scheme_file(record_path): | ||
# type: (RecordPath) -> File | ||
normed_path = os.path.normpath(record_path) | ||
_, scheme_key, dest_subpath = normed_path.split(os.path.sep, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caused a bit of a regression with a dependency I have.
This dependency incorrectly installed a file to python_apt-0.0.0.data/purelib
, rather than creating the purelib directory first. pip install (20.2) now fails with:
ERROR: Exception:
Traceback (most recent call last):
File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 216, in _main
status = self.run(options, args)
File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
return func(self, options, args)
File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 421, in run
pycompile=options.compile,
File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 90, in install_given_reqs
pycompile=pycompile,
File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 831, in install
requested=self.user_supplied,
File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/operations/install/wheel.py", line 830, in install_wheel
requested=requested,
File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/operations/install/wheel.py", line 658, in _install_wheel
for file in files:
File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/operations/install/wheel.py", line 587, in make_data_scheme_file
_, scheme_key, dest_subpath = normed_path.split(os.path.sep, 2)
ValueError: not enough values to unpack (expected 3, got 2)
Previously this would have been ignored. While it's the package's fault, perhaps this could use a little improvement on error handling? I'm willing to put up a PR, but am unsure how this case should be handled.
Thanks! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the research and creating a separate issue, much appreciated. :)
I submitted #8656 to handle this case in a way that will be hopefully more clear.
Over the past several days, we have removed almost all code that requires a wheel to be extracted to a temporary directory before installation. In this last piece, we separate out the file-specific operations from wheel installation and then replace it with code to read directly from the wheel file.
This is a pretty big PR. I have tried to show a clear progression commit-by-commit, along with justifications for individual decisions. Please let me know if I can do anything to make this easier to review!
A few followups will be needed:
/
and path traversals (e.g.../..
), similar to what we do inunpacking.unzip_file
.I think we can handle those in separate PRs, to keep this one focused, but I am also OK if we want to incorporate them in this one. I can work on those shortly regardless.
Closes #6030.