Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple, explicit CLI to install a .whl file #92

Closed
wants to merge 5 commits into from

Conversation

takluyver
Copy link
Member

In the interests of moving the CLI question forwards, here's my take on a minimal CLI. That's minimal in the sense of a low-level interface to what installer can do, with no additional abstractions.

As a starting point, this requires that all the destination directories - purelib, platlib, headers, scripts, data - are specified explicitly as inputs. This would obviously be pretty verbose, but I think it's still useful, because downstream packaging is often built around shell commands, so even a long one is more convenient than embedding Python code to use the library. I also like that it makes the destination directories explicit, rather than downstreams finding ways to patch or hack around sysconfig or distutils to control where files end up.

But this is a starting point, and we could decide to add various convenience features:

  • Make --script-kind optional and write POSIX launchers by default, which is probably what 90% of downstreams want, given the diverse packaging ecosystems on Linux especially.
  • Make --platlib optional, use the same directory as --purelib by default
  • Make --headers required only if the wheel contains a foo-1.2.data/headers directory (AFAIK this is either never or almost never used - the only thing I'm familiar with that ships headers is numpy, and it has them inside the package directory)
  • Add a --destdir or --root option, for a path to prefix to all destination directories (Enable installation under a specified prefix #58). This is purely a convenience, because the caller can add the prefix themselves (--purelib ${destdir}/path/to/site-packages etc.), but maybe passing it 5 times is inconvenient enough to be worth a shortcut?
  • Add an option - or an entirely separate entry point, like python -m installer.thispython - to do an installation for the Python & platform where installer is running (like pip, or the proposal in Add CLI #66).

@takluyver takluyver mentioned this pull request Jan 3, 2022
@pradyunsg
Copy link
Member

pradyunsg commented Jan 3, 2022

Thanks for filing this!

I imagine that the tests will complain about not having coverage. I'm pretty sure that we should have #58 satisfied (happy with that being a follow up), but beyond that, I'm totally on board with this.

@layday
Copy link
Member

layday commented Jan 3, 2022

Would redistributors have use for something so skimpy?

because downstream packaging is often built around shell commands, so even a long one is more convenient than embedding Python code to use the library

Yeah, but they still have to do all the work collecting the paths and switching the prefix out, right? And they don't have the facility to do that more generally, I imagine. How would this work in practice? Would it be...

paths=$(python -c '
    a whole lotta path mangling
')
python -m installer
    --interpreter $(command -v python)
    --script-kind posix
    --purelib $(paths | jq??)
    --platlib $(paths | jq??)
    --headers $(paths | jq??)
    --scripts $(paths | jq??)
    --data $(paths | jq??)
    wheel.whl

And at that point, why not write everything in Python, including the installer invocation?

Of course, it might simpler than I'm imagining, but we should still have a little think about how the CLI's going to be used.

@jameshilliard
Copy link

Would redistributors have use for something so skimpy?

Yeah, it would be helpful as it provides a cleaner way to interface with installer from the cli.

Yeah, but they still have to do all the work collecting the paths and switching the prefix out, right? And they don't have the facility to do that more generally, I imagine. How would this work in practice?

We're pretty much set up for this already, for us at least we generally pass around cli params like this in make variables already, like this.

And at that point, why not write everything in Python, including the installer invocation?

We have to do invocations from make to python at some point anyways, this generally works best if there is a built in cli interface in the tool we're calling.

def main():
"""Entry point for CLI."""
ap = argparse.ArgumentParser("python -m installer")
ap.add_argument("wheel_file", help="Path to a .whl file to install")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this support installing the wheel file if passed as a glob, like dist/*.whl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it won't be unglobbed in Python, but why would it not be unglobbed in the shell?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we invoke python setup.py install while in the package directory, maybe we should make the wheel_file argument optional and autodetect the dist folder and install from there if present and a .whl file is there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will work if the shell expands the glob to a single wheel. At present, it won't accept multiple wheels to install - that's easy enough to do if necessary, but I wanted to start with the simplest thing.

maybe we should make the wheel_file argument optional and autodetect...

I think the crucial 'what to install' input should be 100% explicit. That's in keeping with the general pattern of this library. If you know there's a single wheel under dist, and you're running it from a shell, you can pass dist/*.whl and let the shell expand it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't accept multiple wheels to install

Oh, not wanting it to do that, just pick up a single wheel from dist actually.

If you know there's a single wheel under dist, and you're running it from a shell, you can pass dist/*.whl and let the shell expand it.

Yeah, was going to do something like that otherwise, figured would be kinda nice to have similar semantics to existing install methods. Maybe have it use only the newest file in dist/*.whl would make sense or something if a glob is passed?

@layday
Copy link
Member

layday commented Jan 3, 2022

We're pretty much set up for this already, for us at least we generally pass around cli params like this in make variables already, like this.

Will you be hardcoding the paths? So, for example, instead of:

PKG_PYTHON_SETUPTOOLS_INSTALL_TARGET_OPTS = \
    --executable=/usr/bin/python \
    --script-kind posix
    --root=$(TARGET_DIR)

You'll have:

PKG_PYTHON_INSTALLER_TARGET_OPTS = \
    --interpreter=/usr/bin/python \
    --script-kind=posix \
    --purelib=$(TARGET_DIR)/lib/python?.?/site-packages \
    --platlib=$(TARGET_DIR)/lib/python?.?/site-packages \
    --headers=$(TARGET_DIR)/include/python?.? \
    --scripts=$(TARGET_DIR)/bin \
    --data=$(TARGET_DIR)

Is that correct? Or do you have some other way of retrieving the paths?

@jameshilliard
Copy link

jameshilliard commented Jan 3, 2022

Is that correct?

Should be a little different.

Probably something like this I think:

PKG_PYTHON_INSTALLER_TARGET_OPTS = \
    --interpreter=/usr/bin/python \
    --script-kind=posix \
    --purelib=$(TARGET_DIR)/lib/python$(PYTHON3_VERSION_MAJOR)/site-packages \
    --platlib=$(TARGET_DIR)/lib/python$(PYTHON3_VERSION_MAJOR)/site-packages \
    --headers=$(STAGING_DIR)/usr/include/python$(PYTHON3_VERSION_MAJOR)  \
    --scripts=$(TARGET_DIR)/usr/bin \
    --data=$(TARGET_DIR)/usr

@layday
Copy link
Member

layday commented Jan 3, 2022

I see, thank you.

@takluyver
Copy link
Member Author

@layday I don't imagine this will be useful for everyone. Certainly, if you have to get the destination paths from Python as a JSON blob and then use jq to extract them and parse them back, you're better off staying in Python and calling installer as a library. I can see the case for having a separate 'install for this Python' CLI, but I think it's in keeping with the design of installer to have a fully explicit CLI that doesn't take anything from the running Python. And it seems like this has at least one use case. 🙂

@pradyunsg I'll look at adding some tests.

For destdir support, I can see 3 possible approaches:

  1. Fully manual: python -m installer --purelib ${DESTDIR}/path/to/site-packages --scripts ${DESTDIR}/path/to/bin ...
  2. Separate argument: python -m installer --destdir ${DESTDIR} --purelib /path/to/site-packages --scripts /path/to/bin ... (or 2a, --root, in keeping with setuptools/distutils)
  3. Use the environment variable implicitly: python -m installer --purelib /path/to/site-packages --scripts /path/to/bin ...

Do you have a preference?

@pradyunsg
Copy link
Member

pradyunsg commented Jan 4, 2022

Let's leave it fully manual for now?

If folks strongly feel that is not "good enough", then we can add things based on feedback then. Removing a separate option or implicit behaviour would be more tricky. :)

@FFY00
Copy link
Member

FFY00 commented Jan 4, 2022

I have been meaning to write a reply in my PR detailing my opinion in your feedback and how to move forward regarding the CLI as I abandon that work, but since this is moving fast, I will try to quickly summarize it here in hope it can provide some clarity before any decision is made.

IMO this is the wrong approach to this issue. We are currently having a lot of trouble due to downstream customizations of the Python install layout, to the point that we rushed suboptimal changes to Python core that presented themselves problematic (see bpo-45413, which is a result of the mechanism that was introduced to support the downstream patching, mainly affecting pip; see pypa/distutils#88 (comment), introduced in python/cpython#24549, which aims to make distutils rely on sysconfig to support vendor patching in sysconfig instead as distutils is moved out of core). I believe this approach just further propagates the same mentality and will result in people hardcoding install locations or abusing this CLI to do things it wasn't intended to. The hardcoded install locations need to sooner or later be manually updated, and having them manually hardcoded can only result in more bugs. Install locations in Python are not even canonically static anymore and can vary at runtime, further introducing more bugs when this CLI is used in a slightly different environment, resulting in non-obvious broken installs.

Figuring out the install locations of a Python should not be hard, but we currently do have an issue, as presented in bpo-44445. This means wheel installation can not yet be done without querying the headers path from distutils, which is a big problem, but optimally, you would just fetch the install location map from sysconfig and plug it into SchemeDictionaryDestination. This is how easy it should be, but some work still needs to be done for that to be achieved, and that needs to happen before the distutils deprecation.

I have dedicated a big amount of time trying to understand and sort out all this mess, and this is my opinion from the experience gathered from that. I don't have the energy to try to prevent everybody for making mistakes, so I am not gonna to, I am just gonna leave this warning here.

@pradyunsg, if you think this is more than you are willing to handle, please just don't provide a CLI, instead of merging an arguably bad interface like this (No hard feelings @takluyver, I really appreciate your work here, I just think the approach is very understandably misguided, there are a lot of moving pieces here, and something like this will have a lot of implications on the rest of the ecosystem. This feedback is based on my experience actively working in sorting all of the install stuff out for the past 1~2 years, and still, I might be wrong in some of my takeaways. I think this proposal is understandable, but overly optimistic.)

TLDR: Please please please, can we stop with all this install location customization stuff? IMO the only sane, unproblematic way to deal with this is by querying the install locations from the Python interpreter. Manually specifying install locations will result in more bugs and extend the mess that we are already currently facing.

@takluyver
Copy link
Member Author

@FFY00 This kind of criticism is absolutely fine - I was hoping to provoke this sort of discussion, and the code in this PR didn't take a lot of effort, so I won't be upset if we decide it's not useful. I just wanted to make the alternative approach concrete.

FWIW, the issues and PRs you point to seem to me - at least at a quick look - like an argument for this sort of interface, where paths are specified explicitly. Linux distros (and possibly other downstream environments) want to install files in different places to the default locations Python would normally use. In the absence of an explicit way to specify destination paths, they resort to patching or monkeypatching sysconfig and/or distutils, which ultimately leads to more complexity. I can't stop them customising install locations, so the best way to avoid that complexity seems to be providing a simple way to specify destination paths (as in this PR).

This is also useful if you want to do the installation on a different platform, or with a different Python, to the one which will run the code. I think this is @jameshilliard's use case, and I've also wanted to do something similar in Pynsist, to build Windows installers on Linux or Mac.

We may well want an 'install for this Python' entry point as well - although for downstreams which are happy using pip with its bundled dependencies, that's already covered. But I think there is value in a lower-level 'install to these directories' interface.

@FFY00
Copy link
Member

FFY00 commented Jan 4, 2022

Like I said in my reply, I don't have the energy to stop people from making mistakes, so I am not gonna engage in an argument, I am just gonna try to clarify what I mean.

FWIW, the issues and PRs you point to seem to me - at least at a quick look - like an argument for this sort of interface, where paths are specified explicitly. Linux distros (and possibly other downstream environments) want to install files in different places to the default locations Python would normally use. In the absence of an explicit way to specify destination paths, they resort to patching or monkeypatching sysconfig and/or distutils, which ultimately leads to more complexity. I can't stop them customising install locations, so the best way to avoid that complexity seems to be providing a simple way to specify destination paths (as in this PR).

The current issues we have are due to people bypassing existing mechanisms, which is problematic because things are very tightly coupled and changing the paths in one place will result in a plethora of inconsistencies across the ecosystem, which all need to keep up with each other, and that is simply not maintainable. The approach in this PR plays and contributes into that, which I believe is the wrong choice and will result in more long term pain.

I have put a lot of effort in trying to get people to move away from this approach and let the Python interpreter be the source of truth for install locations (see https://gist.github.com/FFY00/625f65681fbcd7fc039dd4d727bb2c2f), because as we have seen over the years, the current approach of just letting everyone change locations where they want and hope everything works properly together is just not maintainable. It is painful to users, who will have stuff break, it is painful for developers as they will have to implement custom handling for specific user (eg. Debian) and make sure their codebase takes inconsistencies into account and make sure it work with external changes (see pypa/pip#9617, mesonbuild/meson#9288, etc.), and it's painful for Python core as the developers lose a lot of room for change and improvement as the whole ecosystem becomes dependent on how things are implemented, even private APIs, and makes the code incredibly fragile, making any sort of change potentially problematic for thousands of user (an example of this is the distutils deprecation, see https://ffy00.github.io/blog/02-python-debian-and-the-install-locations/).

This is also useful if you want to do the installation on a different platform, or with a different Python, to the one which will run the code. I think this is @jameshilliard's use case, and I've also wanted to do something similar in Pynsist, to build Windows installers on Linux or Mac.

That is a very specific use-case, and the API is more than enough to satisfy it, but IMO this shouldn't be exposed as the CLI for this project. It can be easily integrated in pynsist and cross compiling tooling.

We may well want an 'install for this Python' entry point as well - although for downstreams which are happy using pip with its bundled dependencies, that's already covered. But I think there is value in a lower-level 'install to these directories' interface.

I am not challenging the value of it, but rather what it enables, this is somewhere where I think we should thread very carefully. Of course this has value, and the tooling that needs that value can still make use of this project, but I don't think we should be exposing this to general users, who will use and abuse the interface without understanding the impact it has.


Hopefully that clarifies a bit better my point, and what I am trying to say.

Cheers,
Filipe

@pradyunsg
Copy link
Member

pradyunsg commented Jan 4, 2022

@FFY00 your inputs and opinions are very welcome! ^.^

querying the install locations from the Python interpreter

Is there a single unambigous portable-across-redistributors way to do this? My understanding is that there isn't, and that's why I want to not get involved in this.

Honestly, if there's even a single mechanism that we want to "bless" that's 3.7+ or even 3.11+, I'm onboard for having that be the only thing that the CLI provides support for, as long as:

  • it does not do anything other than unpacking the wheel.
  • it does not bake in "this Python" assumption into the CLI (avoiding a subprocess when using the same interpreter is fine, but require the users to explicitly specify it).

@FFY00
Copy link
Member

FFY00 commented Jan 4, 2022

Is there a single unambigous portable-across-redistributors way to do this? My understanding is that there isn't, and that's why I want to not get involved in this.

The issue here is people patching distutils for the install locations and not sysconfig (https://ffy00.github.io/blog/02-python-debian-and-the-install-locations/ gets into this a bit). This should not be happening as of Python 3.10, when the downstream customization mechanism in sysconfig was introduced. It will certainly not be happening as of Python 3.12.

it does not do anything other than unpacking the wheel.

What about validating compatibility? And the "spread" step specified in PEP 427?

it does not bake in "this Python" assumption into the CLI (avoiding a subprocess when using the same interpreter is fine, but require the users to explicitly specify it).

There is no getting around this. The interpreter is the only reasonable source of truth for the install locations. We can have an introspection script that we run on the target Python to get the install location if you really want this.

@layday
Copy link
Member

layday commented Jan 4, 2022

There is no getting around this. The interpreter is the only reasonable source of truth for the install locations. We can have an introspection script that we run on the target Python to get the install location if you really want this.

https://github.com/layday/pyproject-install does this if anybody wants it, which I honestly don't think they will; distros don't care about installing with/into arbitrary Pythons.

Linux distros (and possibly other downstream environments) want to install files in different places to the default locations Python would normally use. In the absence of an explicit way to specify destination paths, they resort to patching or monkeypatching sysconfig and/or distutils, which ultimately leads to more complexity.

But distutils does allow passing scheme paths via the command line like you are doing here. If they'd rather patch distutils, there must be another reason that they do that.

@jameshilliard
Copy link

@takluyver

I can see the case for having a separate 'install for this Python' CLI, but I think it's in keeping with the design of installer to have a fully explicit CLI that doesn't take anything from the running Python.

Yeah, for cross compilation I think we need explicit overrides here since we can't actually run the target interpreter during install at all(only the host interpreter).

@pradyunsg

Let's leave it fully manual for now?

Seems like a good starting point.

@FFY00

IMO the only sane, unproblematic way to deal with this is by querying the install locations from the Python interpreter.

So how would we actually do this if we can't even run the target interpreter during install(ie when cross compiling?) and need to choose if a package is being installed for the host or the target?

@takluyver

In the absence of an explicit way to specify destination paths, they resort to patching or monkeypatching sysconfig and/or distutils, which ultimately leads to more complexity.

Agreed, we already modify sysconfig a good bit, trying to minimize downstream customizations is ideal, making it more difficult for downstreams to make required overrides isn't going to make things better overall IMO.

@FFY00

The current issues we have are due to people bypassing existing mechanisms, which is problematic because things are very tightly coupled and changing the paths in one place will result in a plethora of inconsistencies across the ecosystem, which all need to keep up with each other, and that is simply not maintainable.

The existing mechanisms simply don't provide the required customization needed for cross compilation from my understanding, at least if the mechanism is provided upstream it can be reworked in one place instead of in a bunch of downstream locations.

I have put a lot of effort in trying to get people to move away from this approach and let the Python interpreter be the source of truth for install locations (see https://gist.github.com/FFY00/625f65681fbcd7fc039dd4d727bb2c2f), because as we have seen over the years, the current approach of just letting everyone change locations where they want and hope everything works properly together is just not maintainable.

The site packages directories from the default install scheme should have a constant value, independly from the distribution we are using, and should always be used by the site module.

Except this assumption seems to be effectively entirely ignoring normal cross compilation scenarios if I'm understanding things correctly.

We're not actually trying to modify the runtime install layout(we want everything installed in the same site-packages for the target interpreter) but we have to use a different interpreter(the build host interpreter) from the target interpreter for build+installation and that has to have a separate runtime site-packages that is entirely independent from the target site-packages since we can't use target site-packages for build+installation at all(since the target interpreter can't be executed during build+installation at that process happens entirely on the build host).

We do try to correctly patch/override sysconfig for cross building at least and we outright don't support any sort of runtime python package installation/building on the target which means there's no need for us to have a separate distro-packages at all(as that seems to be designed to deal with an entirely different issue). On the target it's expected for us that all python packages will be installed in the same site-packages by the time the target interpreter runs for the first time.

Note we actually have effectively multiple sysconfigs here being used by the same host interpreter because our host build dependencies must be installed to the host interpreter site-packages and built against host libs while target runtime dependencies must go in the target site-packages and built against target libs. We currently are using our infrastructure to set this when packages are built/installed so that things end up in the right place.

That is a very specific use-case, and the API is more than enough to satisfy it, but IMO this shouldn't be exposed as the CLI for this project. It can be easily integrated in pynsist and cross compiling tooling.

We really prefer to interface with this sort of thing from the CLI, a python API is much more annoying and less maintainable since our cross compilation tooling is make based and not python based.

There is no getting around this. The interpreter is the only reasonable source of truth for the install locations. We can have an introspection script that we run on the target Python to get the install location if you really want this.

Except it's not possible in cross-compilation scenarios, the interpreter doing the install is the host toolchain interpreter, not the target one(we can't even run the target interpreter during install since it is generally built for an incompatible architecture), we have to be able to change this on the fly as well since we are doing both host and target builds+installations from the same host interpreter.

@takluyver
Copy link
Member Author

I think I'm convinced that the need for this is a niche use case (where it's not possible to run the target Python to find the paths), and the main python -m installer CLI should install for the running Python, as in #66. Maybe this CLI can live at something like python -m installer.manual, or maybe it can just be maintained as an internal tool by people like @jameshilliard who need it.

What about validating compatibility?

From his comments on #66, I think Pradyun would rather that the CLI, like the Python API, just install the wheel it's given, with no extra checks. I'm inclined to agree - you've already made those checks soft-fail if dependencies are missing, for bootstrapping purposes, and that feels like a messy compromise for a low level tool.

And the "spread" step specified in PEP 427?

From how installer already works, I'm pretty sure Pradyun was including the 'spread' step as part of unpacking. PEP 427's suggestion that you first unpack the entire wheel into site-packages and then move files around is somewhat strange to me - it's surely more reliable to unpack files directly to the destination directories.

@takluyver
Copy link
Member Author

I've opened #94, building off @FFY00's branch for #66, as an alternative for consideration. That one specifically installs for the Python interpreter running installer.

@FFY00
Copy link
Member

FFY00 commented Jan 5, 2022

I think I'm convinced that the need for this is a niche use case (where it's not possible to run the target Python to find the paths), and the main python -m installer CLI should install for the running Python, as in #66. Maybe this CLI can live at something like python -m installer.manual, or maybe it can just be maintained as an internal tool by people like @jameshilliard who need it.

I would naturally prefer it to be internal to use-case specific tooling, but I think exposing a CLI like that could be a reasonable compromise, though I am really not jazzed about it. What I am hard against is that it be the main or only CLI provided by this package. So far, I do not think any of the presented use-cases warrant it.

So how would we actually do this if we can't even run the target interpreter during install(ie when cross compiling?) and need to choose if a package is being installed for the host or the target?

You should customize the Python you build to select a different install scheme when cross compiling (via https://docs.python.org/3/library/sysconfig.html#sysconfig._get_preferred_schemes, or https://gist.github.com/FFY00/625f65681fbcd7fc039dd4d727bb2c2f#--with-vendor-config if that gets accepted). When doing this, you will need to ensure the paths are correct, so you need to be careful, but you have to do the same thing for every other cross compiling customization.
You currently already have to patch sysconfig, this should be no different. There is already an upstream supported mechanism to customize the selected layout at runtime, there's no need to have a custom installer for that use-case, we already have hooks to do that in Python directly.

@FFY00
Copy link
Member

FFY00 commented Jan 5, 2022

I can't find the link right now, but IIRC Fedora does a similar thing to select their custom install scheme in Python 3.10. Perhaps @hroncok can point you to it.

@eli-schwartz
Copy link
Contributor

Except it's not possible in cross-compilation scenarios, the interpreter doing the install is the host toolchain interpreter, not the target one(we can't even run the target interpreter during install since it is generally built for an incompatible architecture), we have to be able to change this on the fly as well since we are doing both host and target builds+installations from the same host interpreter.

For e.g. packages which build using meson, cross compilation would require the use of exe_wrapper in order to have meson's target python introspection (essentially the same thing as @layday's example, although meson does scrape for both sysconfig paths and sysconfig vars) and generally target introspection as a whole, run the target python using, say, qemu-user.

Similarly, it would be possible (if not perfectly convenient) to use a pyproject-install CLI that expected an --interpreter argument. You could set it to ${hostbins}/qemu-target-python or something and have that be a shell script that invokes python using qemu. (Then this project doesn't need to explicitly code support for a cross compilation wrapper that is prepended to the interpreter argument).

Under this scenario, --interpreter is useful/usable in probably all cases, while manually hardcoding every single path as CLI arguments is useful nowhere other than "buildroot where we want to use the python API for maximum flexibility, but using CLI command arguments to pass function data because buildroot itself isn't written in python".

tl;dr I'm skeptical that trying to more or less implement the python API as CLI arguments is the ideal implementation form. It could exist as an advanced mode, but I strongly advise against it being the primary, or the only, mode of operation. The worst thing about it is that it encourages people to think it's a good idea to use it for purposes other than "a workaround for cross compilation because the interpreter in question isn't built for this CPU and OS".

Yes, that's right. Manually specifying every path is absolutely a workaround, not an inherent use case. It's not something you want to do, it's something you do because you can't (or find it difficult to) do the --interpreter method.

As such, it is a bad default design.

@hroncok
Copy link

hroncok commented Jan 5, 2022

I can't find the link right now, but IIRC Fedora does a similar thing to select their custom install scheme in Python 3.10. Perhaps @hroncok can point you to it.

fedora-python/cpython@f77f87b#diff-d593bd299ba58e440ba411ffa0640ccd9d20d518b0cf2644ed4bdb75a82a3e70

@FFY00
Copy link
Member

FFY00 commented Jan 5, 2022

Thanks! I see you are not using sysconfig._get_preferred_schemes, so it actually differs a bit from what I was proposing 😅

@hroncok
Copy link

hroncok commented Jan 5, 2022

Eventually, we intend to get there, but not all tools respected that when we tried.

@jameshilliard
Copy link

jameshilliard commented Jan 5, 2022

@takluyver

Maybe this CLI can live at something like python -m installer.manual

This would be fine.

maybe it can just be maintained as an internal tool by people like @jameshilliard who need it

I'm really trying to avoid having to maintain something like this downstream since it means we are less likely to be in sync with upstream changes, if a maintained cli option is simply removed/changed in the future we want it to throw a hard error so the failure can be identified easily and so that we can migrate as needed, if we have to use the python API we're more likely to hit subtle difficult to trace breakage IMO.

From his comments on #66, I think Pradyun would rather that the CLI, like the Python API, just install the wheel it's given, with no extra checks.

Yeah, I think that's fine for our use case, we're fairly experienced with debugging these sort of issues in general so it's not a big deal as long as the functionality needed to fix the issue is properly exposed.

PEP 427's suggestion that you first unpack the entire wheel into site-packages and then move files around is somewhat strange to me - it's surely more reliable to unpack files directly to the destination directories.

Yeah, this also seems very error prone, and we also need to make sure some stuff like headers ends up not in the target rootfs but rather the sysroot/staging directory(where headers and stuff for the target package builds live, this is separate from the target rootfs and is not present at runtime on the target) so that they are usable by other packages during build. So I would expect something like this to not be desirable as we may accidentally end up with stuff like headers in the target rootfs.

@FFY00

What I am hard against is that it be the main or only CLI provided by this package.

Yeah, I'm not against having a normal non-manual CLI, that functionality is just fairly well covered by other tools like pip I think.

You should customize the Python you build to select a different install scheme when cross compiling (via https://docs.python.org/3/library/sysconfig.html#sysconfig._get_preferred_schemes, or https://gist.github.com/FFY00/625f65681fbcd7fc039dd4d727bb2c2f#--with-vendor-config if that gets accepted). When doing this, you will need to ensure the paths are correct, so you need to be careful, but you have to do the same thing for every other cross compiling customization.

It's not really clear if this would be sufficient for cross compilation, cross compilation requires a special hybrid setup effectively that requires dynamic scheme selection based for each package build/install invocation, some of which are mixed scheme in a way.

there's no need to have a custom installer for that use-case, we already have hooks to do that in Python directly

Our build system is make based so calling python api's directly is not very maintainable.

@eli-schwartz

For e.g. packages which build using meson, cross compilation would require the use of exe_wrapper in order to have meson's target python introspection (essentially the same thing as @layday's example, although meson does scrape for both sysconfig paths and sysconfig vars) and generally target introspection as a whole, run the target python using, say, qemu-user.

We don't use exe_wrapper for that, we have some special cased overrides however for gobject-introspection along those lines but exe_wrapper doesn't work well in the general case for us since our infrastructure generally expects host build tools to be compiled for the host architecture. Also using qemu wrappers for everything would be quite slow I think.

We override the binaries in the meson cross compilation config with scripts that call qemu wrappers.

Similarly, it would be possible (if not perfectly convenient) to use a pyproject-install CLI that expected an --interpreter argument. You could set it to ${hostbins}/qemu-target-python or something and have that be a shell script that invokes python using qemu. (Then this project doesn't need to explicitly code support for a cross compilation wrapper that is prepended to the interpreter argument).

I mean I think at a minimum this doesn't make much sense for our use case since stuff like the headers directory isn't even something that will exist on the target rootfs, so the target interpreter config isn't even going to be really what we want from my understanding. From my understanding we need dynamic override capability in some form.

Yes, that's right. Manually specifying every path is absolutely a workaround, not an inherent use case. It's not something you want to do, it's something you do because you can't (or find it difficult to) do the --interpreter method.

I mean I don't really think we have a good way to get this from the target interpreter since it's not expected to have the cross compilation target environment configured for itself in general but rather the target runtime configuration...it's complicated.

@hroncok @FFY00

fedora-python/cpython@f77f87b#diff-d593bd299ba58e440ba411ffa0640ccd9d20d518b0cf2644ed4bdb75a82a3e70

Hmm, interesting, so is the intention to be able to use env variables to override site scheme on the fly in upstream python?

@FFY00
Copy link
Member

FFY00 commented Jan 5, 2022

It's not really clear if this would be sufficient for cross compilation, cross compilation requires a special hybrid setup effectively that requires dynamic scheme selection based for each package build/install invocation, some of which are mixed scheme in a way.

I am aware of the needs of cross-compilation. The approach I described should be sufficient.

@jameshilliard
Copy link

I am aware of the needs of cross-compilation. The approach I described should be sufficient.

It's just somewhat unclear how it would fit/integrate into our python package infrastructure, keep in mind that something sufficient for a generalized cross-compilation build may not be sufficient for integration with our infrastructure which often has additional requirements since we also aren't a normal distribution(we don't support binary package management/installation like deb/rpm/opkg) but rather are a source based rootfs generator tool essentially.

@eli-schwartz
Copy link
Contributor

We don't use exe_wrapper for that, we have some special cased overrides however for gobject-introspection along those lines but exe_wrapper doesn't work well in the general case for us since our infrastructure generally expects host build tools to be compiled for the host architecture. Also using qemu wrappers for everything would be quite slow I think.

We override the binaries in the meson cross compilation config with scripts that call qemu wrappers.

Meson's exe_wrapper isn't about running the target system's native compiler using emulation. It's quite reasonable to use cross compilers rather than emulating native compilers. The exe_wrapper is used for things like... running the cross target's python, which is required in order to get include paths, the EXT_SUFFIX, whether to link to libpython, etc. But also when, say, running the testsuite.

...

So essentially the problem is that you want to install everything as per sysconfig paths, except for headers because "the embedded rootfs doesn't need that".

And the solution is to override the paths and install headers outside of the staging directory, instead of for example to have a cleanup stage immediately before baking the image, which prunes unwanted contents?

Personally, I think this is a confusing setup. But if that's what you really need, okay. I do think that this sounds like a choice rather than a requirement, and that a CLI installer should not feel bound to support it, even if it does choose to do so anyway in an advanced, non-default mode.

@jameshilliard
Copy link

The exe_wrapper is used for things like... running the cross target's python, which is required in order to get include paths, the EXT_SUFFIX, whether to link to libpython, etc. But also when, say, running the testsuite.

Yeah, I'm aware...we don't want testsuite running so we only use qemu when absolutely needed for gobject-introspection...and since the target python will have correct info in a number of cases here so it's kinda pointless to use it for configuration IMO, also it's really not necessary.

So essentially the problem is that you want to install everything as per sysconfig paths, except for headers because "the embedded rootfs doesn't need that".

Well sysconfig paths I think are mixed between sysroot and host interpreter right now, but we def don't want headers ending up in the target rootfs since that's the wrong location(target rootfs directory is where the runtime stuff lives, not compile time dependencies).

And the solution is to override the paths and install headers outside of the staging directory, instead of for example to have a cleanup stage immediately before baking the image, which prunes unwanted contents?

Staging is the target sysroot(ie does not get installed into the rootfs but is where stuff needed for building target binaries lives).

@pradyunsg
Copy link
Member

Closing this out in favour of #94.

I’m convinced that such a CLI being the default / easier-to-use than install-using-sysconfig paths is a bad idea. If someone really wants this interface, they can write a Python script that does this, using the API.

@pradyunsg pradyunsg closed this Jan 6, 2022
@jameshilliard
Copy link

I’m convinced that such a CLI being the default / easier-to-use than install-using-sysconfig paths is a bad idea.

So maybe this should be reworked to go under python -m installer.manual or something as a non-default cli?

@takluyver
Copy link
Member Author

I'm happy to move this CLI to a different name under the installer module if @pradyunsg wants. But I also get @FFY00's argument that it could be a solution that lets people easily get round an immediate problem, but leads to more problems down the line.

The CLI I proposed here is built entirely the public, documented Python API of the installer module, so if there are only niche use cases, I do think it's reasonable to say the downstream users who have that use case can maintain a wrapper like this for themselves. I don't agree that changes in a Python API are any more likely or harder to debug than changes in a CLI.

@jameshilliard
Copy link

I do think it's reasonable to say the downstream users who have that use case can maintain a wrapper like this for themselves.

Isn't that just going to lead to significant implementation fragmentation? I mean that's a big issue right now in general with python build/install integrations.

@takluyver
Copy link
Member Author

At present, our guess is that there aren't many downstreams that actually need this fully manual interface, in which case fragmentation isn't too big a concern. If that turns out to be wrong, I think we'd probably want to incorporate this.

@jameshilliard
Copy link

@takluyver

our guess is that there aren't many downstreams that actually need this fully manual interface

I think anyone cross compiling will need something like this, although it seems the important part is going to be a root override functionality you can see what my latest integration attempt looks like here.

in which case fragmentation isn't too big a concern

It's a concern for downstream maintainers...we can of course manage to work around these issues but it's just adding more to the pile of hacks needed for python cross compilation due to the lack of upstream support...

@takluyver
Copy link
Member Author

Yours is the only downstream I've actually heard from that really focuses on cross compiling. Most downstream packagers we deal with (Linux distros, things like Anaconda & Spack) build on the target platform. Anaconda has a 'noarch' shortcut to allow pure-Python packages to be built once, but I don't think they'd use installer for that in any case. Spack supports some kind of cross-compiling, but it's not the main way it's used, and they're working on using pip for building Python packages.

If there are other packaging systems in a similar situation, I think we'd be interested in hearing from them. But hearing repeatedly from one downstream system doesn't really make the case that it's a general need.

@FFY00
Copy link
Member

FFY00 commented Jan 10, 2022

I think anyone cross compiling will need something like this, although it seems the important part is going to be a root override functionality you can see what my latest integration attempt looks like here.

As I have said, you can achieve this with #66 or #94 by customizing sysconfig. You will need to add an install scheme with those paths, and make sysconfig._get_preferred_scheme return it when you are cross compiling.

@jameshilliard
Copy link

Yours is the only downstream I've actually heard from that really focuses on cross compiling.

There are others like openembedded and openwrt, we're significantly more upstream first focused than those though.

@FRidh
Copy link

FRidh commented Jan 16, 2022

Cross-compilation is also supported in Nixpkgs. Whenever the Python interpreter is used, a hook is executed that sets

        export _PYTHON_HOST_PLATFORM='${pythonHostPlatform}'
        export _PYTHON_SYSCONFIGDATA_NAME='${pythonSysconfigdataName}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants