-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple, explicit CLI to install a .whl file #92
Conversation
Thanks for filing this! I imagine that the tests will complain about not having coverage. I'm pretty sure that we should have #58 satisfied (happy with that being a follow up), but beyond that, I'm totally on board with this. |
Would redistributors have use for something so skimpy?
Yeah, but they still have to do all the work collecting the paths and switching the prefix out, right? And they don't have the facility to do that more generally, I imagine. How would this work in practice? Would it be...
And at that point, why not write everything in Python, including the installer invocation? Of course, it might simpler than I'm imagining, but we should still have a little think about how the CLI's going to be used. |
Yeah, it would be helpful as it provides a cleaner way to interface with installer from the cli.
We're pretty much set up for this already, for us at least we generally pass around cli params like this in make variables already, like this.
We have to do invocations from make to python at some point anyways, this generally works best if there is a built in cli interface in the tool we're calling. |
def main(): | ||
"""Entry point for CLI.""" | ||
ap = argparse.ArgumentParser("python -m installer") | ||
ap.add_argument("wheel_file", help="Path to a .whl file to install") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this support installing the wheel file if passed as a glob, like dist/*.whl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it won't be unglobbed in Python, but why would it not be unglobbed in the shell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we invoke python setup.py install
while in the package directory, maybe we should make the wheel_file
argument optional and autodetect the dist folder and install from there if present and a .whl
file is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will work if the shell expands the glob to a single wheel. At present, it won't accept multiple wheels to install - that's easy enough to do if necessary, but I wanted to start with the simplest thing.
maybe we should make the wheel_file argument optional and autodetect...
I think the crucial 'what to install' input should be 100% explicit. That's in keeping with the general pattern of this library. If you know there's a single wheel under dist
, and you're running it from a shell, you can pass dist/*.whl
and let the shell expand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't accept multiple wheels to install
Oh, not wanting it to do that, just pick up a single wheel from dist actually.
If you know there's a single wheel under
dist
, and you're running it from a shell, you can passdist/*.whl
and let the shell expand it.
Yeah, was going to do something like that otherwise, figured would be kinda nice to have similar semantics to existing install methods. Maybe have it use only the newest file in dist/*.whl
would make sense or something if a glob is passed?
Will you be hardcoding the paths? So, for example, instead of:
You'll have:
Is that correct? Or do you have some other way of retrieving the paths? |
Should be a little different. Probably something like this I think:
|
I see, thank you. |
@layday I don't imagine this will be useful for everyone. Certainly, if you have to get the destination paths from Python as a JSON blob and then use jq to extract them and parse them back, you're better off staying in Python and calling @pradyunsg I'll look at adding some tests. For destdir support, I can see 3 possible approaches:
Do you have a preference? |
Let's leave it fully manual for now? If folks strongly feel that is not "good enough", then we can add things based on feedback then. Removing a separate option or implicit behaviour would be more tricky. :) |
I have been meaning to write a reply in my PR detailing my opinion in your feedback and how to move forward regarding the CLI as I abandon that work, but since this is moving fast, I will try to quickly summarize it here in hope it can provide some clarity before any decision is made. IMO this is the wrong approach to this issue. We are currently having a lot of trouble due to downstream customizations of the Python install layout, to the point that we rushed suboptimal changes to Python core that presented themselves problematic (see bpo-45413, which is a result of the mechanism that was introduced to support the downstream patching, mainly affecting pip; see pypa/distutils#88 (comment), introduced in python/cpython#24549, which aims to make distutils rely on sysconfig to support vendor patching in sysconfig instead as distutils is moved out of core). I believe this approach just further propagates the same mentality and will result in people hardcoding install locations or abusing this CLI to do things it wasn't intended to. The hardcoded install locations need to sooner or later be manually updated, and having them manually hardcoded can only result in more bugs. Install locations in Python are not even canonically static anymore and can vary at runtime, further introducing more bugs when this CLI is used in a slightly different environment, resulting in non-obvious broken installs. Figuring out the install locations of a Python should not be hard, but we currently do have an issue, as presented in bpo-44445. This means wheel installation can not yet be done without querying the I have dedicated a big amount of time trying to understand and sort out all this mess, and this is my opinion from the experience gathered from that. I don't have the energy to try to prevent everybody for making mistakes, so I am not gonna to, I am just gonna leave this warning here. @pradyunsg, if you think this is more than you are willing to handle, please just don't provide a CLI, instead of merging an arguably bad interface like this (No hard feelings @takluyver, I really appreciate your work here, I just think the approach is very understandably misguided, there are a lot of moving pieces here, and something like this will have a lot of implications on the rest of the ecosystem. This feedback is based on my experience actively working in sorting all of the install stuff out for the past 1~2 years, and still, I might be wrong in some of my takeaways. I think this proposal is understandable, but overly optimistic.) TLDR: Please please please, can we stop with all this install location customization stuff? IMO the only sane, unproblematic way to deal with this is by querying the install locations from the Python interpreter. Manually specifying install locations will result in more bugs and extend the mess that we are already currently facing. |
@FFY00 This kind of criticism is absolutely fine - I was hoping to provoke this sort of discussion, and the code in this PR didn't take a lot of effort, so I won't be upset if we decide it's not useful. I just wanted to make the alternative approach concrete. FWIW, the issues and PRs you point to seem to me - at least at a quick look - like an argument for this sort of interface, where paths are specified explicitly. Linux distros (and possibly other downstream environments) want to install files in different places to the default locations Python would normally use. In the absence of an explicit way to specify destination paths, they resort to patching or monkeypatching sysconfig and/or distutils, which ultimately leads to more complexity. I can't stop them customising install locations, so the best way to avoid that complexity seems to be providing a simple way to specify destination paths (as in this PR). This is also useful if you want to do the installation on a different platform, or with a different Python, to the one which will run the code. I think this is @jameshilliard's use case, and I've also wanted to do something similar in Pynsist, to build Windows installers on Linux or Mac. We may well want an 'install for this Python' entry point as well - although for downstreams which are happy using pip with its bundled dependencies, that's already covered. But I think there is value in a lower-level 'install to these directories' interface. |
Like I said in my reply, I don't have the energy to stop people from making mistakes, so I am not gonna engage in an argument, I am just gonna try to clarify what I mean.
The current issues we have are due to people bypassing existing mechanisms, which is problematic because things are very tightly coupled and changing the paths in one place will result in a plethora of inconsistencies across the ecosystem, which all need to keep up with each other, and that is simply not maintainable. The approach in this PR plays and contributes into that, which I believe is the wrong choice and will result in more long term pain. I have put a lot of effort in trying to get people to move away from this approach and let the Python interpreter be the source of truth for install locations (see https://gist.github.com/FFY00/625f65681fbcd7fc039dd4d727bb2c2f), because as we have seen over the years, the current approach of just letting everyone change locations where they want and hope everything works properly together is just not maintainable. It is painful to users, who will have stuff break, it is painful for developers as they will have to implement custom handling for specific user (eg. Debian) and make sure their codebase takes inconsistencies into account and make sure it work with external changes (see pypa/pip#9617, mesonbuild/meson#9288, etc.), and it's painful for Python core as the developers lose a lot of room for change and improvement as the whole ecosystem becomes dependent on how things are implemented, even private APIs, and makes the code incredibly fragile, making any sort of change potentially problematic for thousands of user (an example of this is the distutils deprecation, see https://ffy00.github.io/blog/02-python-debian-and-the-install-locations/).
That is a very specific use-case, and the API is more than enough to satisfy it, but IMO this shouldn't be exposed as the CLI for this project. It can be easily integrated in pynsist and cross compiling tooling.
I am not challenging the value of it, but rather what it enables, this is somewhere where I think we should thread very carefully. Of course this has value, and the tooling that needs that value can still make use of this project, but I don't think we should be exposing this to general users, who will use and abuse the interface without understanding the impact it has. Hopefully that clarifies a bit better my point, and what I am trying to say. Cheers, |
@FFY00 your inputs and opinions are very welcome! ^.^
Is there a single unambigous portable-across-redistributors way to do this? My understanding is that there isn't, and that's why I want to not get involved in this. Honestly, if there's even a single mechanism that we want to "bless" that's 3.7+ or even 3.11+, I'm onboard for having that be the only thing that the CLI provides support for, as long as:
|
The issue here is people patching distutils for the install locations and not sysconfig (https://ffy00.github.io/blog/02-python-debian-and-the-install-locations/ gets into this a bit). This should not be happening as of Python 3.10, when the downstream customization mechanism in sysconfig was introduced. It will certainly not be happening as of Python 3.12.
What about validating compatibility? And the "spread" step specified in PEP 427?
There is no getting around this. The interpreter is the only reasonable source of truth for the install locations. We can have an introspection script that we run on the target Python to get the install location if you really want this. |
https://github.com/layday/pyproject-install does this if anybody wants it, which I honestly don't think they will; distros don't care about installing with/into arbitrary Pythons.
But distutils does allow passing scheme paths via the command line like you are doing here. If they'd rather patch distutils, there must be another reason that they do that. |
Yeah, for cross compilation I think we need explicit overrides here since we can't actually run the target interpreter during install at all(only the host interpreter).
Seems like a good starting point.
So how would we actually do this if we can't even run the target interpreter during install(ie when cross compiling?) and need to choose if a package is being installed for the host or the target?
Agreed, we already modify sysconfig a good bit, trying to minimize downstream customizations is ideal, making it more difficult for downstreams to make required overrides isn't going to make things better overall IMO.
The existing mechanisms simply don't provide the required customization needed for cross compilation from my understanding, at least if the mechanism is provided upstream it can be reworked in one place instead of in a bunch of downstream locations.
Except this assumption seems to be effectively entirely ignoring normal cross compilation scenarios if I'm understanding things correctly. We're not actually trying to modify the runtime install layout(we want everything installed in the same We do try to correctly patch/override Note we actually have effectively multiple
We really prefer to interface with this sort of thing from the CLI, a python API is much more annoying and less maintainable since our cross compilation tooling is
Except it's not possible in cross-compilation scenarios, the interpreter doing the install is the host toolchain interpreter, not the target one(we can't even run the target interpreter during install since it is generally built for an incompatible architecture), we have to be able to change this on the fly as well since we are doing both host and target builds+installations from the same host interpreter. |
I think I'm convinced that the need for this is a niche use case (where it's not possible to run the target Python to find the paths), and the main
From his comments on #66, I think Pradyun would rather that the CLI, like the Python API, just install the wheel it's given, with no extra checks. I'm inclined to agree - you've already made those checks soft-fail if dependencies are missing, for bootstrapping purposes, and that feels like a messy compromise for a low level tool.
From how |
I would naturally prefer it to be internal to use-case specific tooling, but I think exposing a CLI like that could be a reasonable compromise, though I am really not jazzed about it. What I am hard against is that it be the main or only CLI provided by this package. So far, I do not think any of the presented use-cases warrant it.
You should customize the Python you build to select a different install scheme when cross compiling (via https://docs.python.org/3/library/sysconfig.html#sysconfig._get_preferred_schemes, or https://gist.github.com/FFY00/625f65681fbcd7fc039dd4d727bb2c2f#--with-vendor-config if that gets accepted). When doing this, you will need to ensure the paths are correct, so you need to be careful, but you have to do the same thing for every other cross compiling customization. |
I can't find the link right now, but IIRC Fedora does a similar thing to select their custom install scheme in Python 3.10. Perhaps @hroncok can point you to it. |
For e.g. packages which build using meson, cross compilation would require the use of Similarly, it would be possible (if not perfectly convenient) to use a pyproject-install CLI that expected an --interpreter argument. You could set it to Under this scenario, --interpreter is useful/usable in probably all cases, while manually hardcoding every single path as CLI arguments is useful nowhere other than "buildroot where we want to use the python API for maximum flexibility, but using CLI command arguments to pass function data because buildroot itself isn't written in python". tl;dr I'm skeptical that trying to more or less implement the python API as CLI arguments is the ideal implementation form. It could exist as an advanced mode, but I strongly advise against it being the primary, or the only, mode of operation. The worst thing about it is that it encourages people to think it's a good idea to use it for purposes other than "a workaround for cross compilation because the interpreter in question isn't built for this CPU and OS". Yes, that's right. Manually specifying every path is absolutely a workaround, not an inherent use case. It's not something you want to do, it's something you do because you can't (or find it difficult to) do the --interpreter method. As such, it is a bad default design. |
fedora-python/cpython@f77f87b#diff-d593bd299ba58e440ba411ffa0640ccd9d20d518b0cf2644ed4bdb75a82a3e70 |
Thanks! I see you are not using |
Eventually, we intend to get there, but not all tools respected that when we tried. |
This would be fine.
I'm really trying to avoid having to maintain something like this downstream since it means we are less likely to be in sync with upstream changes, if a maintained cli option is simply removed/changed in the future we want it to throw a hard error so the failure can be identified easily and so that we can migrate as needed, if we have to use the python API we're more likely to hit subtle difficult to trace breakage IMO.
Yeah, I think that's fine for our use case, we're fairly experienced with debugging these sort of issues in general so it's not a big deal as long as the functionality needed to fix the issue is properly exposed.
Yeah, this also seems very error prone, and we also need to make sure some stuff like headers ends up not in the target rootfs but rather the sysroot/staging directory(where headers and stuff for the target package builds live, this is separate from the target rootfs and is not present at runtime on the target) so that they are usable by other packages during build. So I would expect something like this to not be desirable as we may accidentally end up with stuff like headers in the target rootfs.
Yeah, I'm not against having a normal non-manual CLI, that functionality is just fairly well covered by other tools like pip I think.
It's not really clear if this would be sufficient for cross compilation, cross compilation requires a special hybrid setup effectively that requires dynamic scheme selection based for each package build/install invocation, some of which are mixed scheme in a way.
Our build system is
We don't use We override the binaries in the meson cross compilation config with scripts that call qemu wrappers.
I mean I think at a minimum this doesn't make much sense for our use case since stuff like the headers directory isn't even something that will exist on the target rootfs, so the target interpreter config isn't even going to be really what we want from my understanding. From my understanding we need dynamic override capability in some form.
I mean I don't really think we have a good way to get this from the target interpreter since it's not expected to have the cross compilation target environment configured for itself in general but rather the target runtime configuration...it's complicated.
Hmm, interesting, so is the intention to be able to use env variables to override site scheme on the fly in upstream python? |
I am aware of the needs of cross-compilation. The approach I described should be sufficient. |
It's just somewhat unclear how it would fit/integrate into our python package infrastructure, keep in mind that something sufficient for a generalized cross-compilation build may not be sufficient for integration with our infrastructure which often has additional requirements since we also aren't a normal distribution(we don't support binary package management/installation like |
Meson's exe_wrapper isn't about running the target system's native compiler using emulation. It's quite reasonable to use cross compilers rather than emulating native compilers. The exe_wrapper is used for things like... running the cross target's python, which is required in order to get include paths, the EXT_SUFFIX, whether to link to libpython, etc. But also when, say, running the testsuite. ... So essentially the problem is that you want to install everything as per sysconfig paths, except for headers because "the embedded rootfs doesn't need that". And the solution is to override the paths and install headers outside of the staging directory, instead of for example to have a cleanup stage immediately before baking the image, which prunes unwanted contents? Personally, I think this is a confusing setup. But if that's what you really need, okay. I do think that this sounds like a choice rather than a requirement, and that a CLI installer should not feel bound to support it, even if it does choose to do so anyway in an advanced, non-default mode. |
Yeah, I'm aware...we don't want testsuite running so we only use qemu when absolutely needed for gobject-introspection...and since the target python will have correct info in a number of cases here so it's kinda pointless to use it for configuration IMO, also it's really not necessary.
Well sysconfig paths I think are mixed between sysroot and host interpreter right now, but we def don't want headers ending up in the target rootfs since that's the wrong location(target rootfs directory is where the runtime stuff lives, not compile time dependencies).
Staging is the target sysroot(ie does not get installed into the rootfs but is where stuff needed for building target binaries lives). |
Closing this out in favour of #94. I’m convinced that such a CLI being the default / easier-to-use than install-using-sysconfig paths is a bad idea. If someone really wants this interface, they can write a Python script that does this, using the API. |
So maybe this should be reworked to go under |
I'm happy to move this CLI to a different name under the installer module if @pradyunsg wants. But I also get @FFY00's argument that it could be a solution that lets people easily get round an immediate problem, but leads to more problems down the line. The CLI I proposed here is built entirely the public, documented Python API of the installer module, so if there are only niche use cases, I do think it's reasonable to say the downstream users who have that use case can maintain a wrapper like this for themselves. I don't agree that changes in a Python API are any more likely or harder to debug than changes in a CLI. |
Isn't that just going to lead to significant implementation fragmentation? I mean that's a big issue right now in general with python build/install integrations. |
At present, our guess is that there aren't many downstreams that actually need this fully manual interface, in which case fragmentation isn't too big a concern. If that turns out to be wrong, I think we'd probably want to incorporate this. |
I think anyone cross compiling will need something like this, although it seems the important part is going to be a root override functionality you can see what my latest integration attempt looks like here.
It's a concern for downstream maintainers...we can of course manage to work around these issues but it's just adding more to the pile of hacks needed for python cross compilation due to the lack of upstream support... |
Yours is the only downstream I've actually heard from that really focuses on cross compiling. Most downstream packagers we deal with (Linux distros, things like Anaconda & Spack) build on the target platform. Anaconda has a 'noarch' shortcut to allow pure-Python packages to be built once, but I don't think they'd use If there are other packaging systems in a similar situation, I think we'd be interested in hearing from them. But hearing repeatedly from one downstream system doesn't really make the case that it's a general need. |
As I have said, you can achieve this with #66 or #94 by customizing sysconfig. You will need to add an install scheme with those paths, and make |
There are others like openembedded and openwrt, we're significantly more upstream first focused than those though. |
Cross-compilation is also supported in Nixpkgs. Whenever the Python interpreter is used, a hook is executed that sets
|
In the interests of moving the CLI question forwards, here's my take on a minimal CLI. That's minimal in the sense of a low-level interface to what
installer
can do, with no additional abstractions.As a starting point, this requires that all the destination directories - purelib, platlib, headers, scripts, data - are specified explicitly as inputs. This would obviously be pretty verbose, but I think it's still useful, because downstream packaging is often built around shell commands, so even a long one is more convenient than embedding Python code to use the library. I also like that it makes the destination directories explicit, rather than downstreams finding ways to patch or hack around sysconfig or distutils to control where files end up.
But this is a starting point, and we could decide to add various convenience features:
--script-kind
optional and write POSIX launchers by default, which is probably what 90% of downstreams want, given the diverse packaging ecosystems on Linux especially.--platlib
optional, use the same directory as--purelib
by default--headers
required only if the wheel contains afoo-1.2.data/headers
directory (AFAIK this is either never or almost never used - the only thing I'm familiar with that ships headers is numpy, and it has them inside the package directory)--destdir
or--root
option, for a path to prefix to all destination directories (Enable installation under a specified prefix #58). This is purely a convenience, because the caller can add the prefix themselves (--purelib ${destdir}/path/to/site-packages
etc.), but maybe passing it 5 times is inconvenient enough to be worth a shortcut?python -m installer.thispython
- to do an installation for the Python & platform where installer is running (like pip, or the proposal in Add CLI #66).