-
-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix incompatibility with py3 in autogen/pari #22044
Comments
Branch: u/chapoton/22044 |
Commit: |
New commits:
|
Reviewer: Jeroen Demeyer |
This comment has been minimized.
This comment has been minimized.
comment:3
This does't look right to me. Instead you should change |
This comment has been minimized.
This comment has been minimized.
comment:5
Please take care of that in whatever way you want. |
comment:6
Replying to @embray:
Why introduce an extra decoding/encoding step? The native format for filenames is |
comment:8
Well for one, the patch as it stands is creating an inhomogeneous list containing In fact there is no "native format for filenames" in Python in part because it's more complicated than that especially when you consider Windows--there is no one right answer to that question. Which is why all the stdlib functions which take a filesystem path accept both bytes and unicode. In any case, the general pattern in Python 3 with functions at the system boundary is to immediately convert from bytes to text, use text, and the convert back to bytes when going back out to the system (which in many cases, like opening files, happens transparently in Python). The only exception is when those bytes will never, ever be used in another context, like working with file and network protocols directly. The extra encoding/decoding steps are trivial otherwise. |
comment:9
FWIW an easy enough workaround is to pass |
comment:10
Replying to @embray:
I don't get what the problem is with treating filenames as
When you use |
comment:11
It's not really a "workaround"--poor choice of words. What I mean is it's an easy way to go from binary to text in that context.
That's where you're wrong. The further away bytes get from their original source, the harder it is to know where they came from or how they should be interpreted. This compounded when you have bytes coming from multiple sources, in possibly different encodings, and you try to combine them while ignoring where they came from. Many examples of where this can go wrong are discussed here: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#why-not-just-assume-utf-8-and-avoid-having-to-decode-at-system-boundaries and even in the rather recent PEP 529: https://www.python.org/dev/peps/pep-0529/ You're not in poor company having doubts about this. Armin Ronacher has written extensively about it, such as here: http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ I don't agree with him (and for some reasons he doesn't give in his list of possible objections), but he's not without a point, and in fact has since pushed for many useful updates to Python 3 to make dealing with bytes and strings a bit less of a hassle. |
comment:12
Or if you want just a more practical argument, going back to my point that this is mixing bytes and strings in a single list, this easily results in something like: >>> paths = [b'/usr/lib', '/usr/local/lib']
>>> filename = [os.path.join(p, 'libfoo.so') for p in paths]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "/usr/lib/python3.4/posixpath.py", line 89, in join
"components") from None
TypeError: Can't mix strings and bytes in path components Although that may or may not apply in this specific case, it's just bad practice in general for reasons like this. |
comment:13
Replying to @embray:
Easy: as a
This is not the case here.
That's a general discussion about using unicode in Python. I am specifically talking about filenames
I see. If I understand things correctly, using Replying to @embray:
Right. But I would argue to just use |
comment:14
Replying to @embray:
I just read this and I agree almost with everything he says... Python 3 insists too much on unicode. One particular pet peeve of mine not mentioned in that essay is |
Changed branch from u/chapoton/22044 to |
Changed commit from |
comment:16
Filenames are byte strings on Linux. I think there is a Windows issue somewhere here but hopefully thats ok if the path is plain ascii. Python is moving towards byte string filenames on Windows, too, to facilitate posix compatibility. |
comment:17
Sorry, but no, this is absolutely wrong. Why are you mixing bytes and str objects in a single list in application code? |
comment:18
It is absolutely, 100% against the philosophy and design of Python 3 to be passing around
You're confusing what filenames are as internally represented by the OS, and what they actually represent, semantically. If I'm a human being, and my home directory is "C:\Users\риго́рий Перельма́н", then my home directory is "C:\Users\риго́рий Перельма́н", not I speak from hard-fought experience here having helped lead the Python 3 porting effort of a large, 10-15 year old codebase, which interacts with legacy ASCII-based applications and file formats, and is used by international users across platforms and had to have backwards compatibility with users' assumptions that strings (i.e., when they type |
comment:19
Replying to @jdemeyer:
You can agree with Armin, and you wouldn't be altogether wrong. But where you are wrong is trying to defy the design and intent of Python 3. Armin brought up these issues in part to influence how Python 3 moved forward in treating these issue more sanely to people like him who frequently work on boundary code. However, nowhere does he argue that Python 3 should just be used incorrectly if you don't have to. |
comment:20
Replying to @jdemeyer:
It's not "somehow"--it's using the |
comment:22
Replying to @embray:
The "most of the time" is going to bite you someday. |
comment:23
Whether you or me are right or wrong, you should never re-open a ticket that the release manager has closed. Feel free to continue the discussion on a new ticket or to complain directly to the release manager to revert this. |
comment:24
That's fair. |
comment:25
Replying to @vbraun:
This doesn't make any sense. In Python filenames are neither "bytes" or "str". Python does not have a "filename" type (though it has been argued for in the past, and in fact there's a relevant PEP being drafted somewhere, though I think instead they went with yet another magic method of some kind :) If you're writing code that is solely interfacing with POSIX interfaces then yes, filenames are just collections of bytes and can stay in bytes form. But outside that narrow context you have to think about filenames in the abstract, which more-often-than-not a text string of human-readable glyphs (I think this is one thing Windows got right, though the POSIX approach has its advantages as well). Filenames are a user-interface. That's why they're not even stored in inodes, but rather in dirents as a convenient way for humans (and programs written and used by humans) to locate files. While it's true in POSIX a filename can be any arbitrary sequence of bytes, and the kernel is agnostic to such sordid details as character encodings, that's an implementation choice. In Python, although there are wrappers around POSIX interfaces, you're writing Python not POSIX, and there text is treated as text (especially in Python 3) and in most cases filenames too are text. In order wedge the POSIX notion of "filenames are |
caused by #21613
CC: @embray @mkoeppe @jdemeyer
Component: python3
Author: Frédéric Chapoton
Branch:
02c5061
Reviewer: Jeroen Demeyer
Issue created by migration from https://trac.sagemath.org/ticket/22044
The text was updated successfully, but these errors were encountered: