-
-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
py3: simplified string conversion utilities #24222
Comments
Branch pushed to git repo; I updated commit sha1. New commits:
|
Commit: |
comment:3
OK, so now we have to discuss encodings... first of all, I really don't like the I don't see why you would pick
For efficiency, I would rather have several sets of functions, each with a hard-coded encoding. I would also suggest to implement these functions (or at least the functions where you care about performance) as |
comment:4
You don't need
by
You could also just use |
comment:5
Replying to @jdemeyer:
That's what I thought originally too, but I was having some problems trying to do it as a one-liner. I'll try your suggestion though and double-check whether it works. |
comment:6
I don't think the "encoding" argument slows anything down by any significant amount, especially in cases where it isn't used. I'd be amenable to encoding-specific functions for some cases, as Python has those as well in its API. But right now I'm really trying to avoid breadth of API surface. It is good to have generic versions of these functions that accept any encoding, and this is the bare minimum needed to get Python 3 support off the ground. I think that later we can encoding-specific functions where specific use cases for them can be demonstrated. I don't want to get into the weeds with this right now. As for the default encoding, So on *NIX platforms like we care about the most it's actually mostly a moot point. But a lot of the software Sage interfaces with is locale-aware, so it's better to take that into account as the default than not. |
comment:7
I was wondering about that. But in that case the |
comment:8
Maybe the best solution would be to not have a default encoding and require users to think what encoding they want? |
comment:9
Replying to @embray:
The Python 3 C API has specific functions to encode/decode for certainly particular encodings:
I haven't profiled, but I would hope that these are faster than the generic API. |
comment:10
Replying to @embray:
It doesn't need to exist.
I propose to remove those variables anyway. If you do need them for some reason, they must be in the |
comment:11
Two more things:
when See also #24215 for Cython-compile-time constants in general.
|
comment:12
fails to build, see patchbot report:
|
comment:13
oh, I see. Name conflict between the global "string" module and the local "string" module.. |
comment:14
Replying to @jdemeyer:
That's far too onerous and anxiety-provoking for the user :) Python itself uses default encodings all over the place when in doubt (hence e.g. PyUnicode_DecodeFSDefault). It's an unfortunate fact that there's isn't always one "right answer" here; the best we can do is provide sensible defaults and the ability for user-specified encoding where applicable; that is, where we know we want a specific encoding. Moving forward we can also do more, for example, to ensure that any locale-aware code run by Sage is handled well. I could definitely agree to adding more encoding-specific helper functions, especially for ASCII and UTF-8. But as a first pass, for the sake of getting Python 3 support off the ground, I'd prefer to leave this as is and then make adjustments as specific use cases arise. It will be difficult to even find those specific use cases until and unless we get further along on getting Python 3 working in general (with these functions, plus a few other fixes I'll be posting soon, I've gotten the Sage doctest runner working, so that will help expose a lot of interesting cases quickly). I'll look at the rest of your suggestions; they seem reasonable. |
comment:15
Replying to @jdemeyer:
Ah, I was actually really looking for something like this but I couldn't find it anywhere in the Cython documentation. Am I just blind? |
comment:16
Replying to @embray:
Nevermind; I see now that we explicitly pass that in to |
comment:17
This sounds great ! I was hoping that the doctest framework could be made to work at some point, but was not expecting it soon. |
comment:18
Replying to @fchapoton:
I had to get it working, in part, so that I could run the doctests for this module :) I think it will help things go much faster. |
comment:19
I'm seeing now how wanting to have module-level global variables in conjunction with inline For functions inlined from another module--at least if those functions access global variables from their original module--it should import that module during module initialization and use the correct module dict for globals lookups (just as normal Python functions do, basically). That's an issue beyond this one though, so I'll rework things for now to get rid use of the global variables by these functions (I still want to have default encodings though). |
comment:20
So it turns out Perhaps I'll just stick with that functionality, and take care to use things like |
comment:21
Replying to @embray:
Right, but there are several defaults (each of UTF-8, |
comment:22
Also, I feel that error handling should be different for the different cases: if you are communicating with locale-aware software using |
comment:23
Replying to @embray:
No problem for me, although I would prefer |
comment:84
Fine but it really doesn't make much difference. |
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:87
Good for me if it passes testing. |
comment:88
one green bot. I am setting to positive this important ticket. |
comment:89
Merge conflict |
comment:90
Volker, do you by chance have any idea about what was the conflicting ticket ? |
comment:91
Merge conflict with what? If there were a normal "master" branch into which tickets were merged regularly against which I could compare that would be one thing, but you can't just secretly merge a bunch of tickets all at once, claim "merge conflict", and expect me to guess what it conflicts with. |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
comment:93
Rebased on current develop branch if it helps, but there was no merge conflict. New commits:
|
comment:94
Replying to @embray:
It's not secret: https://github.com/vbraun/sage/tree/develop |
Changed dependencies from #24246 to none |
comment:95
I don't see any conflict... |
Changed branch from u/embray/python3/string-conversions to |
A possible alternative to #24186, implementing simple conversion from C
char
arrays orbytes
objects tostr
objects, and ofstr
objects tobytes
objects. Here "str
" and "bytes
" are to be read exactly for either Python 2 or Python 3, so on Python 2 this means no conversion is performed sincestr is bytes == True
.One thing this does not do is implement any kind of conversion from Python 2
unicode
objects tobytes
. This functionality might be worth adding, in some form, tostr_to_bytes
. But this would add a new feature on Python 2, whereas for now I'm only trying to preserve the existing functionality on Python 2 exactly, while transparently supporting Python 3str
s everywhere that Python 2str
s are supported.CC: @fchapoton @jdemeyer
Component: python3
Author: Erik Bray, Jeroen Demeyer
Branch/Commit:
dec9f3a
Reviewer: Jeroen Demeyer, Erik Bray
Issue created by migration from https://trac.sagemath.org/ticket/24222
The text was updated successfully, but these errors were encountered: