Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listdir() doesn't work with non-trivial symlinks #57981

Closed
pitrou opened this issue Jan 11, 2012 · 26 comments
Closed

listdir() doesn't work with non-trivial symlinks #57981

pitrou opened this issue Jan 11, 2012 · 26 comments
Labels
OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@pitrou
Copy link
Member

pitrou commented Jan 11, 2012

BPO 13772
Nosy @amauryfa, @jaraco, @pitrou, @vstinner, @tjguk, @briancurtin
Files
  • windirlink.patch
  • 8bf88b31ebb7.diff
  • 8bf88b31ebb7.diff
  • b744517b90bc.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-01-24.08:14:37.613>
    created_at = <Date 2012-01-11.20:01:14.247>
    labels = ['type-bug', 'library', 'OS-windows']
    title = "listdir() doesn't work with non-trivial symlinks"
    updated_at = <Date 2013-06-05.00:08:01.436>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2013-06-05.00:08:01.436>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-01-24.08:14:37.613>
    closer = 'pitrou'
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2012-01-11.20:01:14.247>
    creator = 'pitrou'
    dependencies = []
    files = ['24292', '29245', '29246', '29247']
    hgrepos = ['177']
    issue_num = 13772
    keywords = ['patch']
    message_count = 26.0
    messages = ['151087', '151103', '151105', '151106', '151114', '151115', '151785', '151841', '151884', '151885', '166500', '166529', '182523', '183054', '183057', '183059', '183062', '190092', '190168', '190172', '190625', '190626', '190627', '190631', '190632', '190633']
    nosy_count = 8.0
    nosy_names = ['amaury.forgeotdarc', 'jaraco', 'pitrou', 'vstinner', 'tim.golden', 'brian.curtin', 'santoso.wijaya', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue13772'
    versions = ['Python 3.2', 'Python 3.3']

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 11, 2012

    Note how _getfinalpathname works and calling listdir on the final path name also works:

    >>> os.symlink(".\\test", "Lib\\bar")
    >>> os.listdir("Lib\\bar")[:4]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NotADirectoryError: [Error 267] The directory name is invalid: 'Lib\\bar\\*.*'
    
    >>> nt._getfinalpathname("Lib\\bar")
    '\\\\?\\C:\\t\\pathlib\\Lib\\test'
    >>> os.listdir(nt._getfinalpathname("Lib\\bar"))[:4]
    ['185test.db', 'audiotest.au', 'autotest.py', 'badcert.pem']

    Calling listdir on the non-final extended path doesn't work:

    >>> os.listdir('\\\\?\\C:\\t\\pathlib\\Lib\\bar')[:4]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NotADirectoryError: [Error 267] The directory name is invalid: '\\\\?\\C:\\t\\pa
    thlib\\Lib\\bar\\*.*'

    But open() works:

    >>> open('Lib\\bar\\regrtest.py')
    <_io.TextIOWrapper name='Lib\\bar\\regrtest.py' mode='r' encoding='cp1252'>

    @pitrou pitrou added stdlib Python modules in the Lib dir OS-windows type-bug An unexpected behavior, bug, or error labels Jan 11, 2012
    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Jan 12, 2012

    I think this is because "Lib\\bar" is NOT being created as a directory symlink, but rather as a regular one.

    As such, the documentation for symlink where it states the optional target_is_directory=False argument should be automatically detect whether the source is a file or directory does not hold true.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Jan 12, 2012

    Confirmed (on latest py33 build):

    >>> os.listdir('Lib\\bar')[:4]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NotADirectoryError: [Error 267] The directory name is invalid: 'Lib\\bar\\*.*'
    [61658 refs]

    ... after manually deleting the file-symlink bar ...

    >>> os.symlink('.\\test', 'Lib\\bar', target_is_directory=True)
    [61658 refs]
    >>> os.listdir('Lib\\bar')[:4]
    ['185test.db', 'audiotest.au', 'autotest.py', 'badcert.pem']
    [61666 refs]

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 12, 2012

    I think this is because "Lib\\bar" is NOT being created as a directory
    symlink, but rather as a regular one.

    Ha! I didn't even know about that option. Thanks for noticing.

    As such, the documentation for symlink where it states the optional
    target_is_directory=False argument should be automatically detect
    whether the source is a file or directory does not hold true.

    I don't know if auto-detection is a good idea. Of a course, from a Unix
    user's perspective, Windows' behaviour doesn't make a lot of sense.
    Especially when functions other than listdir() work fine anyway.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Jan 12, 2012

    I agree that it's probably not a good idea. Windows users should be aware of symlink limitation on the platform to begin with, IMHO.

    Besides, keeping the detection behaviour in light of this defect would be adding unnecessary complexity to the code. Currently, the src argument (=".\test") is simply given to a Windows function to see if it is a directory, from the currently working directory.

    To make it aware of its path relativity, in the context of the target, is much harder than just changing the documentation. ;-)

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 12, 2012

    As such, the documentation for symlink where it states the optional
    target_is_directory=False argument should be automatically detect
    whether the source is a file or directory does not hold true.

    I don't know if auto-detection is a good idea. Of a course, from a Unix
    user's perspective, Windows' behaviour doesn't make a lot of sense.
    Especially when functions other than listdir() work fine anyway.

    Ah, sorry, I had misunderstood your comment. Indeed, os.symlink
    *already* tries to autodetect the target's file type, but the detection
    is broken with relative symlinks: it doesn't try to join them with the
    link's directory and so it thinks the target doesn't exist.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 22, 2012

    Another issue with the detection scheme is that it's not atomic: there can be a race condition between the GetFileAttributesExW() and CreateSymbolicLinkW() calls.

    I propose applying the following patch to 3.2 and 3.3 (+ doc fix, not included in the patch).

    @briancurtin
    Copy link
    Member

    Looks good to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 24, 2012

    New changeset 839fa289e226 by Antoine Pitrou in branch '3.2':
    Issue bpo-13772: In os.symlink() under Windows, do not try to guess the link
    http://hg.python.org/cpython/rev/839fa289e226

    New changeset a7406565ef1c by Antoine Pitrou in branch 'default':
    Issue bpo-13772: In os.symlink() under Windows, do not try to guess the link
    http://hg.python.org/cpython/rev/a7406565ef1c

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 24, 2012

    Should be fixed now!

    @pitrou pitrou closed this as completed Jan 24, 2012
    @jaraco
    Copy link
    Member

    jaraco commented Jul 26, 2012

    I apologize I missed this issue when it arose, so my comments come late.

    The reason for inferring the directory status of the targets was so that use of os.symlink(link, target) would behave much the same on Unix as on Windows for the most common use cases. By requiring the directory status to be supplied when calling os.symlink, it makes the function barely portable (it becomes only portable for symlinks to files, not directories).

    As was indicated by Larry Hastings in bpo-14917, there's an expectation that Python could easily detect the directory status of the target. There was apparently a bug in the earlier detection code for directories, which doesn't exist in the reference implementation (done in ctypes):

    PS C:\Users\jaraco\projects\jaraco.windows> mkdir -p Lib/test/foo
    PS C:\Users\jaraco\projects\jaraco.windows> python
    Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from jaraco.windows import filesystem as fs
    >>> fs.symlink('.\\test', 'Lib/bar')
    >>> import os
    >>> os.listdir('Lib/bar')
    ['foo']

    I suspect this bug crept in as we worked through the various challenges with directory detection at that low level of Python. As you can see, the reference implementation is clean an straightforward and correct.

    In my opinion, it would be worthwhile restoring the directory detection and creating tests to capture cases where detection fails, rather than removing the implementation.

    I've been using the ctypes implementation for some time, and having automatic directory detection is a huge benefit to portability and simplicity of use.

    It was pointed out that there is a race condition, and theoretically, that is true, but I believe this not to be a problem because the automatic detection is a best-effort. It's used to guess the best possible directory status for the symlink. If there's a directory there one millisecond, then the target is then removed, and the symlink is created as a directory symlink, that's most probably still what the user would have wanted.

    As a heavy user of symlinks in Windows, I would strongly prefer inference of directory status. Of course, I can always continue to use the ctypes implementation in my environments, but I imagine that other potential users would feel the same as I do and would appreciate a more portable implementation.

    Without the directory feature, many uses of os.symlink are not portable and will fail (with ugly results) on Windows. With the directory detection feature, most (if not all) will simply work as expected. Removal of this feature is a regression of the intended functionality, and I'd like it to be considered to be restored.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jul 26, 2012

    Le jeudi 26 juillet 2012 à 19:12 +0000, Jason R. Coombs a écrit :

    Without the directory feature, many uses of os.symlink are not
    portable and will fail (with ugly results) on Windows.

    The target_is_directory argument is supposed to be "supported" (i.e.
    ignored) on other platforms, so you should be able to write portable
    code easily.

    I'm not against restoring detection, but it should be non-buggy and
    correctly unit-tested :) That said it's probably too late for 3.3.

    @jaraco
    Copy link
    Member

    jaraco commented Feb 20, 2013

    I've started work on restoring the directory detection in my bitbucket repo (https://bitbucket.org/jaraco/cpython-issue13772). I have added a regression test to capture the basic failure (where the link is not created in the current working directory) as well as a fix. The fix uses the Microsoft Shell Lightweight Utility Functions which has one benefit of being tested and robust, but has two disadvantages, namely:

    • it adds a link-time dependency.
    • it doesn't support forward-slashes, which the reference implementation does, and which I believe the CPython implementation should.

    Interestingly, it does detect the '/' character as a separator - it just doesn't treat it as one when stripping the trailing path.

    Given these disadvantages, I'm inclined to write custom functions to support the directory detection. Any suggestions are appreciated.

    @jaraco
    Copy link
    Member

    jaraco commented Feb 26, 2013

    I tried the create-patch function, but it didn't work against bitbucket (produced an empty diff). So I'm attaching a diff explicitly.

    @jaraco
    Copy link
    Member

    jaraco commented Feb 26, 2013

    Please disregard the patch 8bf88b31ebb7.

    Antoine, Brian, or anyone else - would you please review the attached patch (b744517b90bc.diff)? It adds a test for creating directory symlinks that are non-local (don't depend on the current directory for reference), backs out Antoine's changes, and replaces them with directory detection that fixes this issue while maintaining directory detection.

    I'm not super-confident about the implementation of the path-manipulation functions, and I would prefer to use the Python implementations of the path-manipulation (dirname and join) instead. If there are any suggestions in this regard, I'd appreciate them.

    That said, the tests now pass for me on Windows (and I don't see why the tests wouldn't also pass on Linux).

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 26, 2013

    I'm not super-confident about the implementation of the
    path-manipulation functions, and I would prefer to use the Python
    implementations of the path-manipulation (dirname and join) instead.
    If there are any suggestions in this regard, I'd appreciate them.

    From an implementation standpoint, I would indeed prefer the path handling
    functions to be written in Python.

    From a principle standpoint, I'm not sure it's a good idea for os.symlink()
    to be non-atomic (there's a small race condition between reading the target's
    attributes and creating the actual symlink).

    Also, since in general you always know whether you're making a link to a
    directory or a file, I'm not sure auto-detection is really a plus (except
    that it makes things more familiar for Unix developers).

    @jaraco
    Copy link
    Member

    jaraco commented Feb 26, 2013

    Thanks for the suggestions.

    Antoine Pitrou added the comment:

    From an implementation standpoint, I would indeed prefer the path handling
    functions to be written in Python.

    Is there an example of how a function like win32_symlink could call into ntpath.dirname()? I believe the reason we did not invoke ntpath was due to a possible race condition where the Python interpreter might not be fully started before os.symlink is invoked. Perhaps there are mitigating factors, but I'm not familiar enough with the Python internals to know what's safe and what is not (or even how to invoke that behavior).

    From a principle standpoint, I'm not sure it's a good idea for os.symlink()
    to be non-atomic (there's a small race condition between reading the target's
    attributes and creating the actual symlink).

    I agree there exists a small possibility of a race condition here. I believe this race condition is of little concern to the vast majority of use-cases. For those use cases where a race condition might be a factor, the target_is_directory parameter may be supplied by that code to overcome the condition.

    Furthermore, since the Windows APIs don't provide a way to match the symlink creation with its target, it's not possible to avoid this race condition.

    Also, since in general you always know whether you're making a link to a
    directory or a file, I'm not sure auto-detection is really a plus (except that it
    makes things more familiar for Unix developers).

    For the vast majority of use cases, target detection will behave much like symlinks do in Unix, which allows os.symlink to be used portably without specifically accounting for Windows behavior. I'm not as concerned about Windows programmers using Python (who might not mind specifying the directory-ness of their targets), but Python programs written by Unix developers running on Windows. In the latter use-case, it's much harder to have those programs adopt an additional parameter to support Windows. I could imagine some library maintainers rejecting a patch that includes target_is_directory just because it's not elegant and seems unnecessary from a Unix programmer's perspective. For that use case and in order to support the general expectation that os.symlink should provide the most uniform behavior it can, the directory detection is highly valuable.

    Indeed, both your initial usage of the function and Larry Hastings' initial reaction betray the expectation that os.symlink should create a directory-type symlink when the target is a directory. I don't see how it's not a huge plus to support a developer's expectations when possible.

    Personally, I would prefer to not have the parameter at all. Unfortunately, to support cases where one might need to create directory symlinks to non-existent targets, the parameter is necessary. However, I'd like to limit its required usage to only the cases where it's necessary.

    I'm not against restoring detection, but it should be non-buggy and
    correctly unit-tested :)

    I'd like to focus on restoring detection, which was vetted and approved in the original implementation. Obviously, I can't guarantee that it doesn't have additional bugs. Provably-correct code is expensive and way outside the scope of this effort. As you could see from your patch, there were tests capturing the desired behavior. That test has been restored. Additionally, this patch includes a new test to capture the case of a non-local link (a behavior that was never tested for Unix symlinks).

    @jaraco
    Copy link
    Member

    jaraco commented May 26, 2013

    Since there's been no additional criticism or discussion on this matter, I'll plan to commit the proposed patch to restore target directory detection. Since the functionality was removed in a bugfix release, it should be acceptable to restore it in a bugfix release. I know 3.2 no longer gets non-security releases, so I'll port the patch to 3.3 and apply it to 3.3 and default.

    I'll plan to do this tomorrow unless there's further discussion.

    @jaraco
    Copy link
    Member

    jaraco commented May 28, 2013

    Ugh. So it seems the 3.3 implementation of win_symlink, which is now been renamed to posix_symlink, is now much more complicated. Not only does it handle symlink arguments on Unix platforms, providing abstraction for a couple of underlying system calls, but the Windows implementation also supports once again wide and narrow characters for the path, which means that the directory detection functions I created are no longer suitable as they also need wide and narrow versions.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 28, 2013

    New changeset 29a2557d693e by Jason R. Coombs in branch '3.3':
    Issue bpo-13772: Restored directory detection of targets in os.symlink on Windows, which was temporarily removed in Python 3.2.3 due to an incomplete implementation. The implementation now works even if the symlink is created in a location other than the current directory.
    http://hg.python.org/cpython/rev/29a2557d693e

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 4, 2013

    New changeset f431cd0edd85 by Victor Stinner in branch 'default':
    Issue bpo-13772: Fix compiler warnings on Windows
    http://hg.python.org/cpython/rev/f431cd0edd85

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 4, 2013

    New changeset c351591f1f63 by Victor Stinner in branch 'default':
    Issue bpo-13772: fix _check_dirA(): call *A() functions, not *W() functions
    http://hg.python.org/cpython/rev/c351591f1f63

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2013

    @jason R. Coombs: Can you please review my following commit? It looks like you made a copy/paste failure in _check_dirA() :-)

    New changeset c351591f1f63 by Victor Stinner in branch 'default':
    Issue bpo-13772: fix _check_dirA(): call *A() functions, not *W() functions
    http://hg.python.org/cpython/rev/c351591f1f63

    @jaraco
    Copy link
    Member

    jaraco commented Jun 4, 2013

    @victor Stinner

    Thanks for fixing this. You're absolutely right.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 4, 2013

    New changeset e024236ea253 by Victor Stinner in branch 'default':
    Issue bpo-13772: Fix a compiler warning on Windows
    http://hg.python.org/cpython/rev/e024236ea253

    New changeset d9f3ea27f826 by Victor Stinner in branch 'default':
    Issue bpo-13772: Mark helper functions as private (static)
    http://hg.python.org/cpython/rev/d9f3ea27f826

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 5, 2013

    New changeset c8212fca8747 by Victor Stinner in branch 'default':
    Issue bpo-13772: Use syntax for literal wchar_t character
    http://hg.python.org/cpython/rev/c8212fca8747

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants