-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyUnicode_FromFormat(): add %T format for an object type name #78776
Comments
The PEP-399 requires that C accelerator behaves exactly as Python, but a lot of C code truncates type name to an arbitrary length: 80, 100, 200, up to 500 (not sure if it's a number of bytes or characters). Py_TYPE(obj)->tp_name is a common pattern: it would be nice to have a new "%T" format in PyUnicode_FromFormat(), so it can be used directly in PyErr_Format(), to format an object type name. Attached PR implements the proposed %T format and modify unicodeobject.c to use it. I propose to then write a second PR to modify all C code of CPython using Py_TYPE(obj)->tp_name to use the new %T type. |
I cannot find %T in printf() manual pages on Fedora 28 (Linux). I can find it in the strftime() documentation: %T The time in 24-hour notation (%H:%M:%S). (SU) But I don't think that it's an issue since printf() and strftime() formatters are exclusive, no? For example, strftime() %s means "number of seconds since the Epoch" whereas printf() %s means a "const char*" byte string. |
Oh, PyUnicode_FromFormat() has %A to format as ASCII, whereas printf() already has %A but for a different meaning: a, A For a conversion, the double argument is converted to hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp+-; for A conversion the prefix 0X, the letters ABCDEF, and the exponent separator P is used. (...) |
I don't think you have to worry about %T being used by other formatting functions. If (heaven forbid) dates were ever supported by PyUnicode_FromFormat(), there would have to be a way to switch from "normal" argument processing to argument-specific formatting specifiers, anyway. |
IIRC there is similar issue or a discussion on one of mailing lists. But the idea about adding this feature on Python side too was considered. It would be better to find this discussion before pushing this change. In some cases we have the type itself, not an instance. So it makes sense to make %T an equivalent of arg->tp_name instead of Py_TYPE(arg)->tp_name. On Python side, you need to output either short name obj.__class__.__name__ or full qualified name obj.__class__.__module__ + '.' + obj.__class__.__qualname__ with exception that the module name should be omitted if it is 'builtins'.
The case of the module name '__main__' can be handled specially too. On C side, the problem is that tp_name means different, depending of the kind of the type. In some cases it is a short name, in other cases it is a full qualified name. It is not easy to write a code that produces the same output in Python and C. I have added a helper _PyType_Name() that helps to solve a part of these issues. If you want to output a short name (just cls.__name__ in Python), use _PyType_Name(cls) instead of cls->tp_name. But this doesn't help for the case when you need to output a full qualified name. |
"arg->tp_name" is rare in the code base, whereas "Py_TYPE(arg)->tp_name" is a very common pattern. I'm not sure that a formatter is needed for "arg->tp_name", you can already use "%s" with "arg->tp_name", no? My intent was to make the C code less verbose, respect the PEP-399, but also indirectly avoid the implicit borrowed reference of Py_TYPE() :-)
Sometimes, the qualified name would be more appropriate, but it's sometimes tricky to decide if the short name, qualified name or fully qualified name is the "right" name... So I chose to restrict this issue to the most common case, Py_TYPE(arg)->tp_name :-) Ah, and changing strings is a risk of breaking the backward compatibility. For example, cause issue with pickle. So it should be discussed on a case by case basis. Moreover, if you want to change a string, the Python code should be updated as well. I suggest to open a new issue to discuss that. Don't get me wrong, I'm interested to do these changes, but it's a wider project :-)
To be honest, I also considered to proposed a second formatter to do something like that :-) But as you explained, I'm not sure which name is the good name: qualified or fully qualified (with module name)? First of all, would it help to have a *function* to get these names? Maybe we could first use such functions before discussing adding a new formatter in PyUnicode_FromFormat()? |
An alternative would be to add multiple formatters. Example:
What do you think Serhiy? |
I think we need to handle only two cases: short and fully qualified names. __qualname__ without __module__ doesn't make sense, and the value of tp_name depends on implementation details (is it Python class or builtin class, heap class or dynamic class?). Maybe use %t and %T? But we may want to support formatting the name of the type itself and the name of the object's type. This give us 4 variants. For old string formatting we can introduce new % codes (with possible modifiers). But in modern string formatting "T" can have meaning for some types (e.g. for datetime). We can implement __format__ for the type type itself (though it can cause confusion if cls.__format__() is different from cls.__format__(instance)), but for formatting the name of the object's type (as in your original proposition) we need to add a new conversion flag like "!r". |
Ok, I wrote PR 9122 to add %t format and modify %T format:
Again, I'm not sure about these ones. _PyType_Name() can be used for %t-like directly on a type. Later we can expose _PyType_FullName() (function that I added in my latest PR) as a private function for %T-like directly on a type.
I'm not sure that I understood directly. Do you want to add a third formatter in PyUnicode_FromFormat() which would use Py_TYPE(obj)->tp_name? I dislike Py_TYPE(obj)->tp_name, since my intent is to conform to the PEP-399: tp_name is not accessible at the Python level, only type(obj).__name__ and type(obj).__qualname__. Or do you want to add a new formatter to type.__format__() to expose %T at the Python level, f"{type(obj).__module__}.{type(obj).__qualname__}"? Currently, type(obj).__name__ is the most popular way to format a string. Would it break the backward compatibility to modify *existing* error messages? |
Nice! I agree that it is easy to use _PyType_Name() directly. But using _PyType_FullName() instead of tp_name can be cumbersome because it returns a new object and needs error handling.
Yes, I think we need a convenient way of formatting fully qualified name that omits the module name for types in the builtins module. It is equivalent to Py_TYPE(obj)->tp_name for extension types which is the most popular way to format a type name in error messages for now. There are several open issues for inconsistency in error messages for Python and C implementations, because the former use type(obj).__name__ or obj.__class__.__name__, and the latter use Py_TYPE(obj)->tp_name. I hope finally we will fix this. |
And in reprs. It is common to format a repr as "{typename}(...)" or "<{typename}(...)>". The difference is whether the typename is a short or fully qualified name. |
It does not. PEP-399 requires that that the C code must pass the same *test suite*. And error messages in particular tend to not be checked in tests. Anyway, I don't see how that applies to replacing I don't think the reasons are strong enough to add new API to PyUnicode_FromFormat(). |
Petr Viktorin asked me to open a wider discussion about this issue on python-dev. I just reverted my first change and posted: |
I close this issue until we can agree on an API. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: