Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

there's no documentation of how to package namespace packages #265

Closed
glyph opened this issue Aug 23, 2016 · 46 comments
Closed

there's no documentation of how to package namespace packages #265

glyph opened this issue Aug 23, 2016 · 46 comments

Comments

@glyph
Copy link

glyph commented Aug 23, 2016

People seem to really want to do this, and they resort to black magic in order to make it happen. See for example protocolbuffers/protobuf#1296. There should probably be clear documentation of how to achieve namespace package packaging in a reliable way.

I think https://setuptools.readthedocs.io/en/latest/pkg_resources.html?highlight=namespace#namespace-package-support is authoritative in this case; should it be referenced?

@dstufft
Copy link
Member

dstufft commented Aug 23, 2016

The best answer here is to use PEP 420 namespaces, but given those are Python 3.3+ only and most people can't/won't move to 3.3+ only then we need another solution. Unfortinately there is no slam dunk solution here, there's basically either:

  • Don't do namespace packages, because they're kinda broken, instead do namespace_package and just deal with the fact that _ is slightly uglier than ..
  • Use the docs you linked, and deal with the fact that installing multiple packages from the same namespace may not work, depending on what mechanisms you used to install them (pip, pip with -e, and setuptools).

@ncoghlan
Copy link
Member

Option 3: we ask @ericsnowcurrently to remove some of the scary warnings from the importlib2 README and actively promote it as a modernisation technique for Python 2.7 virtual environments

That is, PEP 420 namespaces do work on Python 2.7 if the environment is set up to use importlib2 as its import system implementation (and you get a bunch of other nice improvements as well, like the handling of Unicode paths not being completely broken, and a significantly reduced number of stat calls for successful imports).

@ericsnowcurrently
Copy link

I'm not opposed...just have to find the time to make sure importlib2 is in a happy place. :)

@theacodes
Copy link
Member

Bump on this issue

I'm happy to help on the task of updating the user guide with namespace package documentation (having to extensively figure this out for all of the google.* packages). Do we have any consensus on what to recommend?

@dstufft
Copy link
Member

dstufft commented Jan 28, 2017

I think that newer setuptools have fixed setuptools type namespace packages so that they work in all cases. Someone would need to go through and see if the problems in like pypa/pip#3 are fixed and such (and ideally it would use PEP 420 namespace packages where possible) but if all of that checks out I Think that is a reasonable recommendation now.

@theacodes
Copy link
Member

@dstufft thanks, I'll see if I can write a suite of tests to verify.

I think these are all the cases to check, but please let me know if I'm missing anything:

  1. Install two namespace packages with plain pip install.
  2. Install two namespace packages with python setup.py install.
  3. Install one package with pip install and the other with python setup.py install.
  4. Install one package with pip install and the other with pip install -e.
  5. Install one package with pip install and the other with python setup.py develop.

Repeat 3-5 with the steps reversed.

@dstufft
Copy link
Member

dstufft commented Feb 1, 2017

@jonparrott Yes, I think those are the cases. Somewhere in pypa/pip#3 or one of the related issues someone had steps to reproduce the issue from scratch that might be useful.

@theacodes
Copy link
Member

@dstufft thanks, stay tuned. :)

@theacodes
Copy link
Member

Okay, initial results:

Package A command Package B command Status
pip install . pip install . passed
pip install . pip install -e . passed
pip install . python setup.py install failed
pip install . python setup.py develop passed
pip install -e . pip install . passed
pip install -e . pip install -e . passed
pip install -e . python setup.py install failed
pip install -e . python setup.py develop passed
python setup.py install pip install . failed
python setup.py install pip install -e . failed
python setup.py install python setup.py install failed
python setup.py install python setup.py develop failed
python setup.py develop pip install . passed
python setup.py develop pip install -e . passed
python setup.py develop python setup.py install failed
python setup.py develop python setup.py develop passed

This is with PEP 420 packages using Python 3. Does that seem about right to you?

You can see the source here: https://github.com/jonparrott/namespace-pkg-tests

And you can run the tests with:

$ pip install nox-automation
$ nox

@dstufft
Copy link
Member

dstufft commented Feb 1, 2017

I think for PEP 420 packages you don't use the namespace kwarg to setup(). But I would have expected it to pass in all examples.

@theacodes
Copy link
Member

@dstufft you mean the namespace_packages kwarg? Interesting, let me try that.

@theacodes
Copy link
Member

Well, removing the namespace_packages kwargs seems to work the same:

nox > * verify(command_a=('pip', 'install', '.'), command_b=('pip', 'install', '.'), interpreter='python3'): passed
nox > * verify(command_a=('pip', 'install', '.'), command_b=('pip', 'install', '-e', '.'), interpreter='python3'): passed
nox > * verify(command_a=('pip', 'install', '.'), command_b=('python', 'setup.py', 'install'), interpreter='python3'): failed
nox > * verify(command_a=('pip', 'install', '.'), command_b=('python', 'setup.py', 'develop'), interpreter='python3'): passed
nox > * verify(command_a=('pip', 'install', '-e', '.'), command_b=('pip', 'install', '.'), interpreter='python3'): passed
nox > * verify(command_a=('pip', 'install', '-e', '.'), command_b=('pip', 'install', '-e', '.'), interpreter='python3'): passed
nox > * verify(command_a=('pip', 'install', '-e', '.'), command_b=('python', 'setup.py', 'install'), interpreter='python3'): failed
nox > * verify(command_a=('pip', 'install', '-e', '.'), command_b=('python', 'setup.py', 'develop'), interpreter='python3'): passed
nox > * verify(command_a=('python', 'setup.py', 'install'), command_b=('pip', 'install', '.'), interpreter='python3'): failed
nox > * verify(command_a=('python', 'setup.py', 'install'), command_b=('pip', 'install', '-e', '.'), interpreter='python3'): failed
nox > * verify(command_a=('python', 'setup.py', 'install'), command_b=('python', 'setup.py', 'install'), interpreter='python3'): failed
nox > * verify(command_a=('python', 'setup.py', 'install'), command_b=('python', 'setup.py', 'develop'), interpreter='python3'): failed
nox > * verify(command_a=('python', 'setup.py', 'develop'), command_b=('pip', 'install', '.'), interpreter='python3'): passed
nox > * verify(command_a=('python', 'setup.py', 'develop'), command_b=('pip', 'install', '-e', '.'), interpreter='python3'): passed
nox > * verify(command_a=('python', 'setup.py', 'develop'), command_b=('python', 'setup.py', 'install'), interpreter='python3'): failed
nox > * verify(command_a=('python', 'setup.py', 'develop'), command_b=('python', 'setup.py', 'develop'), interpreter='python3'): passed

@theacodes
Copy link
Member

Even worse, all sessions fail on Python 2 when using PEP 420 packages and not specifying namespace_packages. So unless I'm doing something wrong, which is very likely, it doesn't seem like we're at "universal compatibility" for namespace packages between pip and setuptools.

I've pushed up my latest code, if anyone wants to poke around.

It seems like we could recommend namespace packages with the caveat that no one should ever install the package with python setup.py install, but develop is okay.

@dstufft
Copy link
Member

dstufft commented Feb 2, 2017

That seems reasonable, perhaps it would make sense to open an issue with setuptools to see if there is something they can do about the setup.py install case, but overall that seems to be a fine thing to me.

@theacodes
Copy link
Member

Filed pypa/setuptools#960

@dstufft can you take a look at this package and verify that this is up-to-date with what we want to recommend?

Once we're good with that, I can start writing the docs. :)

@theacodes
Copy link
Member

Got confirmation from @jaraco on pypa/setuptools#960 that cross-installing isn't something we want to support, so I'll just explicitly call that out in these docs.

My current working plan is to recommend the following:

  • If you only support Python 3, use PEP 420-style namespace packages.
  • If you want to support 2 & 3, use the non-PEP 420-style namespace packages.

@dstufft we have https://github.com/pypa/sampleproject, does it make sense to have https://github.com/pypa/sample-namespace-package-project and https://github.com/pypa/sample-pep420-namespace-package project? I'm happy to actually provide these samples, they just need a home.

@simonvanderveldt
Copy link

@jonparrott I'd be interested in your opinion on this:
If you replace the namespace packages implementation in the __init__.py files in from your test repo with the pkgutil.extend_path() version actually all the non_pep420 tests pass for me.
Note that you also have to remove the namespace_packages= declaration from setup.py.
If you want to try this you can use my fork of your repo https://github.com/simonvanderveldt/namespace-pkg-tests

Seems like using pkgutil.extend_path would be the better choice based on this compared to pkg_resources.declare_namespace. What do you think?

@theacodes
Copy link
Member

@simonvanderveldt do you have a comparison of how the test matrix shakes out in the case?

@simonvanderveldt
Copy link

simonvanderveldt commented Mar 31, 2017

@simonvanderveldt do you have a comparison of how the test matrix shakes out in the case?

@jonparrott I've added the table and amended my commit
Basically only additional passes, no new failures and all non_pep420 tests pass

@theacodes
Copy link
Member

@simonvanderveldt very interesting.

I'm curious to see what @jaraco and others think about those results- this begs the question of if we should stop recommending pkg_resources.declare_namespace.

@jaraco
Copy link
Member

jaraco commented Mar 31, 2017

That's an interesting finding, and encouraging. If we can deprecate use of pkg_resources and declare_namespace, that would be awesome. I'll do some experiments in my own namespace packages and see if they fail anywhere using pkgutil.

@theacodes
Copy link
Member

theacodes commented Mar 31, 2017

Thanks @jaraco! Looking forward to your findings. :)

@simonvanderveldt for posterity, can you open a PR against my repo to make that change?

@dstufft gentle poke on my question above

@simonvanderveldt
Copy link

@simonvanderveldt for posterity, can you open a PR against my repo to make that change?

Done!
I'll just follow this issue, interested in what the conclusion will be :)

@amcgregor
Copy link

amcgregor commented Mar 31, 2017

The now well documented shenanigans relating to namespace package installation and __path__ en-garbaging are a pretty consistent thorn in getting new users set up with development tools, ref my comment over here. Unfortunately, by design the suite of libraries I develop are built to operate without code modification or even much in the way of conditional execution on the CPython 2.7+, 3.3+, and Pypy (both 2.x and 3.x compatible) runtimes.

It would be fantastic if there was one approach that would work across all runtimes in a sensible way. For me, that's a requirement for any replacement, not a nice-to-have, as the status quo is manageable given dependency-aware Makefile automation. (Seriously. My Makefiles will become self-aware one day.)

There is the additional concern that for any sufficiently large collection of namespace contributing packages, updates to a newer mechanism may be partial. Any change that is not compatible with mixed usage would require hard pinning and fixing all contributing packages simultaneously. A potentially massive undertaking, greatly increasing friction of adoption of a new mechanism (where backwards compatibility is an issue).

@theacodes
Copy link
Member

There is the additional concern that for any sufficiently large collection of namespace contributing packages, updates to a newer mechanism may be partial. Any change that is not compatible with mixed usage would require hard pinning and fixing all contributing packages simultaneously. A potentially massive undertaking, greatly increasing friction of adoption of a new mechanism (where backwards compatibility is an issue).

@amcgregor agreed. We're just trying to figure out the state of things right now, then figure out what to recommend. We're not an authority, so what we recommend isn't law. If the whatever method you're currently using works fine, then don't mess with it. The Google namespace packages I shepherd probably won't change.

@amcgregor
Copy link

amcgregor commented Mar 31, 2017

@jonparrott Figuring out namespaces and figuring out entry points were both aspects of Python packaging that were extremely poorly documented in the past, and in many cases required diving into the actual pkg_resources code to figure out how to get things to work. (It took an hour and a half a three hour presentation on module/packaging internals, watched three times, to figure out that if there are failing imports to import the parent namespace and check __path__ for the culprit borked package—this is a rather complicated part of Python.) I mainly wanted to chip in as one of those eager people who took the last line of the Zen of Python and ran with it… possibly too eagerly, and now have a monumental collection of cooperating packages usable as a test case. (Many of these contribute to multiple top level spaces, nearly all utilize entry_points; new users I point at existing packages to learn from, currently.)

Any solution more elegant than > 100 lines of Makefile extracting metadata from egg-info to inform dependency graphing for a hierarchy of interdependent checked out packages would be good in my books. The possibility of the hard pinning situation, though, gives me the creeps.

Direct manipulation/assignment of __path__ seems to get to the business of what's going on far more quickly and obviously, which I like, from the above discussion:

__path__ = __import__('pkgutil').extend_path(__path__, __name__)

Edited to add links to a few of the packages, for example purposes, and link to the presentation I mentioned, which is by far the most in-depth and awesome thing I have ever seen regarding this aspect of Python. (Can remove some if too spammy. ;)

@theacodes
Copy link
Member

The lack of documentation is definitely what I'm trying to fix right now. :)

@dstufft
Copy link
Member

dstufft commented Apr 1, 2017

@jonparrott I don't do a whole lot with the user facing documentation side of things (I am horrible at it :( ) but that seems reasonable to me?

@jaraco
Copy link
Member

jaraco commented Apr 1, 2017

I'll do some experiments in my own namespace packages and see if they fail anywhere using pkgutil.

Right off the bat, I find that removing the call to declare_namespace leads to this error:

$ .tox/python/bin/python setup.py egg_info
running egg_info
writing jaraco.packaging.egg-info/PKG-INFO
writing dependency_links to jaraco.packaging.egg-info/dependency_links.txt
writing entry points to jaraco.packaging.egg-info/entry_points.txt
writing namespace_packages to jaraco.packaging.egg-info/namespace_packages.txt
writing requirements to jaraco.packaging.egg-info/requires.txt
writing top-level names to jaraco.packaging.egg-info/top_level.txt
error: Namespace package problem: jaraco is a namespace package, but its
__init__.py does not call declare_namespace()! Please fix it.
(See the setuptools manual under "Namespace Packages" for details.)
"

Aah, right. Need to remove the namespace_packages declaration, then it passes.

@ncoghlan
Copy link
Member

ncoghlan commented Apr 1, 2017

@jaraco Right, there were two steps proposed in @simonvanderveldt's adjustment:

  • remove the 'namespace_packages=...' declaration from setup.py
  • switch from pkg_resources.declare_namespace() to pkgutil.extend_path() in the common __init__.py

That approach should work in any version since 2.3 (when pkgutil.extend_path was added), so even folks that still need RHEL 5 (and hence Python 2.4) compatibility will be able to rely on it.

So if that means pkg_resources.declare_namespace() and namespace_packages=... can be skipped entirely for new 2.6+ and 2.7+ code, and we can update the docs to say that, I'd definitely consider it a win :)

@theacodes
Copy link
Member

Awesome. Okay, I have one final curiosity - what happens if you mix a pkgutil.extend_path with a pkg_resources package? That could be a pretty interesting case. I'm also curious what happens when you mix pep420 and non-pep420. I'll add those to my test matrix. :)

@theacodes
Copy link
Member

I don't do a whole lot with the user facing documentation side of things (I am horrible at it :( ) but that seems reasonable to me?

@dstufft I'm here to help with user-facing documentation :) I'll create those sample repos on my own github and then we can figure out how to get them under pypa once I write the docs for namespace packages.

@theacodes
Copy link
Member

theacodes commented Apr 3, 2017

@simonvanderveldt @jaraco @amcgregor good and bad news. :)

Based on the new table here. The pkgutil method succeeds in regardless of pip or setuptools in both 2 and 3, it works everywhere! :D It seems this is the method we should bless and recommend in this documentation. Please, please, please do any follow-up tests in case there are caveats or edge cases my tests do not account for.

The bad news is that this unsurprisingly doesn't play well if you mix it with the pkg_resources method. I think I'm just going to call that out in the documentation - "Do not mix methods within the same namespace, it will lead to pain and suffering".

@dstufft any reservations on me moving forward with writing documentation suggesting the pkgutil method?

@simonvanderveldt
Copy link

simonvanderveldt commented Apr 3, 2017

@jonparrott Awesome! Looks promising :)

I wanted to add that both the Python 2 and the Python 3 docs mention this implementation in the pkgutil docs, it might be useful to reference those?

Also, as already mentioned in this issue, PEP-420 defines a new and simpler way of doing namespace packages, it might be useful to suggest to use that for Python 3.3+?
Note that pkgutil.extend_path() should be PEP-420 compatible, but the intention of that compatibility is to help out existing packages not suggest new/Python 3.3+ packages to use it :)

@theacodes
Copy link
Member

theacodes commented Apr 3, 2017 via email

@simonvanderveldt
Copy link

I can definitely mention pep420 for Python 3 only packages, but the table
shows that it does have some tooling issues.

Fair point. You probably know best what to do documentation wise for the end users.
It might be a good idea though to file some bugs for the incompatibilities. Is anyone doing that? Otherwise I can file some and reference your table.

@theacodes
Copy link
Member

It might be a good idea though to file some bugs for the incompatibilities. Is anyone doing that?

@jaraco might have some insight into what's going on with mixing pip and setuptools with pep420 packages. The biggest concern is that installing both packages with python setup.py install fails.

@jaraco
Copy link
Member

jaraco commented Apr 4, 2017

The bad news is that this unsurprisingly doesn't play well if you mix it with the pkg_resources method... in the same namespace.

Ugh. That makes it difficult for a namespace to transition. Every single package in that namespace would have to be updated and not mixed with any old versions. I can't imagine how we might transition the ecosystem under such a world-breaking scenario.

what's going on with mixing pip and setuptools with pep420 packages

Setuptools supports installing pep420 packages, but has issues with find_packages because there's no way to detect packages without init.py files.

The biggest concern is that installing both packages with python setup.py install fails.

Well, that's because under setuptools, without any extra parameters, setup.py install invokes easy_install, which goes down a whole different build chain and installs eggs. Better to just use pip to install. I'd like to deprecate easy_install, but I'm reluctant to make setuptools rely on pip and create even more circular dependencies at the base of everything. I'm sure there are tickets and I welcome more, but more important is I need help with the design and implementation.

@ncoghlan
Copy link
Member

ncoghlan commented Apr 4, 2017

Looking at the code for the two implementations:

The two most likely culprits for cross-compatibility issues seem to be:

  1. pkg_resources assumes that it can track all namespace packages and live-update their children when their __path__ changes.
  2. pkg_resources still implements its own path-walking and extension logic, rather than delegating that aspect to pkgutil.extend_path

So what if the agreed way forward was to:

  1. Recommend the use of either pkgutil.extend_path() (if you need Python 2 or setuptools.find_packages() compatibility) or implicit PEP 420 namespaces for new namespace packages, and make it clear that these two are cross-compatible with each other and can be freely mixed in the same namespace (e.g. Python 3 only projects can decide to rely on PEP 420, even if they share the namespace with single source Python 2/3 projects that use pkgutil.extend_path())

  2. For existing pkg_resources.declare_namespace() packages, work towards having that use pkgutil.extend_path() internally such that cross-compatibility can be gained by setting a minimum setuptools version, but in the meantime, advise projects using it that all projects contributing to the namespace need to use declare_namespace(), and not either of the other options.

The related documentation could then be split into two sections on "Defining a new namespace package" (where the first recommendation would apply and the use of pkg_resources.declare_namespace() would be disallowed for contributions to that namespace) and "Contributing submodules to an existing namespace package" (where the recommendation would be to use pkg_resources.declare_namespace() only if the project responsible for the namespace says to do so, and otherwise use pkgutil.extend_path() or implicit namespacing)

@amcgregor
Copy link

The bad news is that this unsurprisingly doesn't play well if you mix it with the pkg_resources method... in the same namespace.

Ugh. That makes it difficult for a namespace to transition. Every single package in that namespace would have to be updated and not mixed with any old versions. I can't imagine how we might transition the ecosystem under such a world-breaking scenario.

That is what I was afraid of. Now to ask myself, do I have the strength of willpower to claim the m space as I migrate packages to the new approach… muhahaha.

@theacodes
Copy link
Member

Recommend the use of either pkgutil.extend_path() (if you need Python 2 or setuptools.find_packages() compatibility) or implicit PEP 420 namespaces for new namespace packages, and make it clear that these two are cross-compatible with each other and can be freely mixed in the same namespace

Slight thorn in this, it seems that mixing pep 420 and pkgutil doesn't work if the first package is installed via setuptools (see updated table here). @jaraco is this because of the aforementioned easy_install code path?

@theacodes
Copy link
Member

Update for those interested: I'm working on a draft of this now and hope to have a PR out on Friday. :)

@theacodes
Copy link
Member

@dstufft @ncoghlan when sending my PR, I noticed there are a large number of outstanding PRs on this repository. I'm happy to help with the burden of PRs reviews on this project if you'll let me know what's involved in earning your collective trust to do so. :)

@dstufft
Copy link
Member

dstufft commented Apr 6, 2017

@jonparrott You have the power.

@theacodes
Copy link
Member

@dstufft great. I'll start working through the trivial PRs soon. In the meantime I'd love to get some feedback on #290.

@ncoghlan
Copy link
Member

ncoghlan commented Apr 7, 2017

Thanks for volunteering @jonparrott!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants