zipimport needs to support namespace packages when no 'directory' entry exists #59110

ericvsmith · 2012-05-25T00:27:07Z

BPO	14905
Nosy	@warsaw, @gpshead, @pjeby, @ronaldoussoren, @ncoghlan, @ericvsmith, @ericsnowcurrently, @serhiy-storchaka, @phmc
Files	zipimport-issue14905.patch zipimport-issue14905-2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-05-25.00:27:06.530>
labels = ['3.8', 'type-feature', 'library']
title = "zipimport needs to support namespace packages when no 'directory' entry exists"
updated_at = <Date 2020-08-13.13:47:03.805>
user = 'https://github.com/ericvsmith'

bugs.python.org fields:

activity = <Date 2020-08-13.13:47:03.805>
actor = 'pconnell'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2012-05-25.00:27:06.530>
creator = 'eric.smith'
dependencies = []
files = ['26302', '26894']
hgrepos = []
issue_num = 14905
keywords = ['patch']
message_count = 16.0
messages = ['161543', '161544', '161584', '164861', '168568', '182245', '182254', '182255', '182279', '182326', '185799', '185936', '186026', '325724', '325764', '375307']
nosy_count = 14.0
nosy_names = ['barry', 'gregory.p.smith', 'pje', 'ronaldoussoren', 'ncoghlan', 'jerub', 'eric.smith', 'Arfrever', 'eric.snow', 'serhiy.storchaka', 'jpaugh', 'pconnell', 'isoschiz', 'superluser']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue14905'
versions = ['Python 3.8']

Linked PRs

ericvsmith · 2012-05-25T00:27:06Z

If a zip file contains "pkg/foo.py" but no "pkg/" entry, it will not be possible for "pkg" to be a namespace package portion.

ericvsmith · 2012-05-25T00:33:02Z

For a (very) brief discussion on the strategy to implement this, see: http://mail.python.org/pipermail/import-sig/2012-May/000528.html

ericvsmith · 2012-05-25T15:16:56Z

See also test_namespace_pkgs.py ZipWithMissingDirectory.test_missing_directory which is currently marked as expectedFailure.

jerub · 2012-07-07T14:47:49Z

Here is a patch that synthesises the directory names at the point where file names are read in. The unit test now passes, and has had the expected failure removed.

Patch collaboration with Diarmuid Bourke <diarmuidbourke@gmail.com> at the europython sprint.

jerub · 2012-08-19T12:59:24Z

Please see attached new patch, based on review comments.

serhiy-storchaka · 2013-02-16T22:25:52Z

This can significant slowdown zipimport. I think we shouldn't support such broken zip files in zipimport.

ncoghlan · 2013-02-17T02:29:44Z

How common are such broken zip files? Like Serhiy, I'm concerned about the possible negative impact on the interpreter startup time as we try to second guess the contents of the zip file manifest.

It seems better to be explicit that we consider such zipfiles broken and they need to be regenerated with full manifests (perhaps providing a script in Tools that fixes them).

ncoghlan · 2013-02-17T02:31:37Z

OTOH, the scan time should be short relative to the time needed to read the manifest in the first place - an appropriate microbenchmark may also be adequate to address my concerns.

ericvsmith · 2013-02-17T17:26:10Z

I don't think such files are common: I've never seen such a file "in the wild". I created one, by accident, while testing PEP-420.

OTOH, it was surprisingly easy to create the malformed file with zipfile.

ronaldoussoren · 2013-02-18T15:33:24Z

Why are zipfiles without entries for directories broken? When you don't care about directory permissions (such as when the zipfile won't be extracted at all) the entries for directories are not necessary. Also, AFAIK the zipfile specification <http://www.pkware.com/documents/casestudies/APPNOTE.TXT\> does not require adding directory entries to the zipfile.

FWIW: the zipfiles created by py2app do no contain entries for directories at the moment. I'll probably add entries for directories in the next update to work around this issue.

pjeby · 2013-04-02T03:25:15Z

Just a note: the zip files produced by the distutils and friends (sdist, bdist_dumb, eggs) do not include entries for plain directories. I would guess that this is also true for wheels at the moment, unless something was specifically done to work around this property of distutils-generated zip files. So ISTM the right thing to do is to synthesize the entries at directory read time, when they're being looped over anyway.

Reviewing the patch, there is a performance optimization possible by making a slight change to the algorithm. Currently the patch loops from the start of the string to the end, looking for path prefixes. This means that the total overall performance is determined by the length of the strings and especially the average directory depth.

However, there is a significant shortcut possible: looping from the *end* of each string to the beginning, it's possible to break out of the loop if the prefix has already been seen -- thus saving (depth-1) dictionary lookups in the average case, and only looking at the characters in the base filename, unless a new directory is encountered... for a typical overhead of one unicode substring, dictionary lookup, and strrchr per zipfile directory entry. (Which is very small compared to what else is going on at that point in the process.)

To elaborate, if you have paths of the form:

x/y/a
x/y/b
x/y/c/d

Then when processing 'x/y/a', you would first process x/y -- it's not in the dict, add it. Then x -- not in the dict, add it. Then you go to x/y/b, your first parent is x/y again -- but since it's in the dict you skip it, and don't even bother with the x. Next you see x/y/c, which is not in the dict, so you add it, then x/y, which is, so you break out of the loop for that item.

Basically, about all that would change would be the for() loop starting at the end of the string and going to the beginning, with the loop position still representing the end of the prefix to be extracted. And the PyDict_Contains check would result in a break rather than a continue.

So, if the only concern keeping the patch from being accepted is that it adds to startup time, this approach would cut down quite a bit on the overhead for generating the path information, in cases of repeated prefixes. (And in the common cases for zipfile use on sys.path, one would expect to see a lot of common prefixes, if only for package names.)

phmc · 2013-04-03T16:47:12Z

The problem appears to be more general. zipimport fails for deeper hierarchies, even with directory entries.

With the supplied patch (zipimport-issue14905-2.patch) I see the following:

$ unzip -l foo.zip
Archive:  foo.zip
  Length      Date    Time    Name
---------  ---------- -----

    0  2013-04-03 17:28   a/b/c/foo.py
    0  2013-04-03 17:34   a/
    0  2013-04-03 17:34   a/b/
    0  2013-04-03 17:34   a/b/c/

--------- -------

        0                     4 files
$ ls
foo.zip
$ PYTHONPATH=foo.zip ~/dev/cpython/python
Python 3.4.0a0 (default:3b1dbe7a2aa0+, Apr  3 2013, 17:31:54) 
[GCC 4.8.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import a
>>> import a.b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'a.b'
>>>

phmc · 2013-04-04T10:12:13Z

I've raised bpo-17633 to track the issue in my last message.

serhiy-storchaka · 2018-09-19T07:31:05Z

zipimport have been rewritten in pure Python (bpo-25711).

serhiy-storchaka · 2018-09-19T13:54:17Z

bpo-34738 fixes distutils.

phmc · 2020-08-13T13:47:04Z

One version of the bug described here (and fixed in the old implementation under bpo-17633) exists in the Python implementation of zipimport:

$ unzip -l namespace1.zip
Archive:  namespace1.zip
  Length      Date    Time    Name
---------  ---------- -----

    0  08-13-2020 06:30   one/
    0  08-13-2020 06:30   one/two/
    0  08-13-2020 06:30   one/two/three.py

--------- -------
0 3 files
$ unzip -l namespace2.zip
Archive: namespace2.zip
Length Date Time Name
--------- ---------- ----- ----
0 08-13-2020 06:37 alpha/beta/gamma.py
--------- -------

        0                     1 file
$ PYTHONPATH=namespace1.zip:namespace2.zip ./python
Python 3.10.0a0 (heads/master:c51db0ea40, Aug 13 2020, 06:41:20)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import one
>>> import alpha
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'alpha'
>>>

In short, imports where there's no separate entry for directories in the zip file don't work.

Any opinions on whether this *is* the problem this issue is trying to track?

jaraco · 2023-09-17T13:52:32Z

I stumbled onto this issue while working on python/importlib_resources#287. In python/importlib_resources@496acc1, I factored out the zip files that the tests use to dynamically generate the zip fixtures so that I could extend the tests to have namespace support. I then pointed it at the namespace fixtures (python/importlib_resources@f0e9a45), but was surprised when namespacedata01 wasn't importable.

I additionally encountered the issue previously in #80921.

When I was developing zipfile.Path, I found that zipfiles without explicit directory entries were common enough that I added support for them (jaraco/zipp#4) and later developed performance optimizations to minimize the performance impacts.

Perhaps that work can be re-purposed to give the zip importer the CompleteDirs treatment and resolve this issue.

…implied by children. Workaround for python/cpython#59110.

….0.1 to version 6.1.0 Jason R. Coombs (19): Add API docs. Closes #245. Pin against sphinx 7.2.5 as workaround for sphinx/sphinx-doc#11662. Closes jaraco/skeleton#88. Allow GITHUB_* settings to pass through to tests. Remove spinner disablement. If it's not already fixed upstream, that's where it should be fixed. Clean up 'color' environment variables. Add diff-cover check to Github Actions CI. Closes jaraco/skeleton#90. Add descriptions to the tox environments. Closes jaraco/skeleton#91. Add FORCE_COLOR to the TOX_OVERRIDE for GHA. Requires tox 4.11.1. Closes jaraco/skeleton#89. Replace static zip fixtures with dynamically generated zip fixtures built from the same modules as found on the file system. Separate 'disk' concern of namespace tests. Prefer ``pass_env`` in tox config. Preferred failure mode for tox-dev/tox#3127 and closes jaraco/skeleton#92. In zip namespace fixtures, explicitly generate the directory entries implied by children. Workaround for python/cpython#59110. Add xfail tests for namespace packages in a zip, capturing missed expectation reported in python/importlib_resources#287. Update MultiplexedPath to expect Traversable and add a compatibility shim with deprecation warning. Update tests for MultiplexedPath to pass traversables, addressing some deprecation warnings. Update changelog When constructing a MultiplexedPath, resolve submodule_search_locations to Traversable objects. Closes python/importlib_resources#287. Honor backslashes in inner paths as found in submodule_search_locations. Finalize

serhiy-storchaka · 2024-07-04T14:47:16Z

#121233 is slightly different from the patches proposed earlier, but the effect is the same. It may be faster for large archives with deep structure, although the difference may be insignificant. It also fixes get_data() and adds many new tests.

…ry exists (GH-121233)

…ry entry exists (pythonGH-121233)

…-121375)

…ry entry exists (pythonGH-121233)

…-121375)

ericvsmith added type-bug An unexpected behavior, bug, or error extension-modules C modules in the Modules dir labels May 25, 2012

serhiy-storchaka added 3.8 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement and removed extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Sep 19, 2018

pppery mannequin changed the title ~~zipimport.c needs to support namespace packages when no 'directory' entry exists~~ zipimport needs to support namespace packages when no 'directory' entry exists Sep 19, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

jaraco added the topic-importlib label Sep 17, 2023

jaraco added a commit to python/importlib_resources that referenced this issue Sep 18, 2023

In zip namespace fixtures, explicitly generate the directory entries …

d3a7f69

…implied by children. Workaround for python/cpython#59110.

bedevere-app bot mentioned this issue Jul 4, 2024

gh-59110: zipimport: support namespace packages when no directory entry exists #121233

Merged

serhiy-storchaka added 3.14 new features, bugs and security fixes and removed 3.8 (EOL) end of life labels Jul 4, 2024

serhiy-storchaka added a commit that referenced this issue Jul 4, 2024

gh-59110: zipimport: support namespace packages when no directory ent…

17d5b9d

…ry exists (GH-121233)

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Jul 4, 2024

pythongh-59110: Fix a debug output for implicit directories

c286b51

bedevere-app bot mentioned this issue Jul 4, 2024

gh-59110: Fix a debug output for implicit directories #121375

Merged

jaraco added a commit to jaraco/cpython that referenced this issue Jul 4, 2024

pythongh-59110: Extract function for _add_implicit_dirs.

88182cc

bedevere-app bot mentioned this issue Jul 4, 2024

gh-59110: Extract function for _add_implicit_dirs. #121378

Closed

serhiy-storchaka added a commit that referenced this issue Jul 5, 2024

gh-59110: Fix a debug output for implicit directories (GH-121375)

cecd601

serhiy-storchaka closed this as completed Jul 5, 2024

noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024

pythongh-59110: zipimport: support namespace packages when no directo…

e402111

…ry entry exists (pythonGH-121233)

noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024

pythongh-59110: Fix a debug output for implicit directories (pythonGH…

777bacf

…-121375)

estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024

pythongh-59110: zipimport: support namespace packages when no directo…

946dcfd

…ry entry exists (pythonGH-121233)

estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024

pythongh-59110: Fix a debug output for implicit directories (pythonGH…

36a76d4

…-121375)

jaraco mentioned this issue Sep 9, 2024

Expand/restore vendoring support for zip archives pypa/setuptools#4641

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zipimport needs to support namespace packages when no 'directory' entry exists #59110

zipimport needs to support namespace packages when no 'directory' entry exists #59110

ericvsmith commented May 25, 2012 •

edited by bedevere-app bot

Loading

ericvsmith commented May 25, 2012

ericvsmith commented May 25, 2012

ericvsmith commented May 25, 2012

jerub mannequin commented Jul 7, 2012

jerub mannequin commented Aug 19, 2012

serhiy-storchaka commented Feb 16, 2013

ncoghlan commented Feb 17, 2013

ncoghlan commented Feb 17, 2013

ericvsmith commented Feb 17, 2013

ronaldoussoren commented Feb 18, 2013

pjeby mannequin commented Apr 2, 2013

phmc mannequin commented Apr 3, 2013

phmc mannequin commented Apr 4, 2013

serhiy-storchaka commented Sep 19, 2018

serhiy-storchaka commented Sep 19, 2018

phmc mannequin commented Aug 13, 2020

jaraco commented Sep 17, 2023

serhiy-storchaka commented Jul 4, 2024

zipimport needs to support namespace packages when no 'directory' entry exists #59110

zipimport needs to support namespace packages when no 'directory' entry exists #59110

Comments

ericvsmith commented May 25, 2012 • edited by bedevere-app bot Loading

Linked PRs

ericvsmith commented May 25, 2012

ericvsmith commented May 25, 2012

ericvsmith commented May 25, 2012

jerub mannequin commented Jul 7, 2012

jerub mannequin commented Aug 19, 2012

serhiy-storchaka commented Feb 16, 2013

ncoghlan commented Feb 17, 2013

ncoghlan commented Feb 17, 2013

ericvsmith commented Feb 17, 2013

ronaldoussoren commented Feb 18, 2013

pjeby mannequin commented Apr 2, 2013

phmc mannequin commented Apr 3, 2013

phmc mannequin commented Apr 4, 2013

serhiy-storchaka commented Sep 19, 2018

serhiy-storchaka commented Sep 19, 2018

phmc mannequin commented Aug 13, 2020

jaraco commented Sep 17, 2023

serhiy-storchaka commented Jul 4, 2024

ericvsmith commented May 25, 2012 •

edited by bedevere-app bot

Loading