More fixes for the Clinic mapping of converters to format units #68188

larryhastings · 2015-04-19T01:00:34Z

BPO	24000
Nosy	@larryhastings, @zware, @serhiy-storchaka
Files	larry.one.more.clinic.format.unit.map.cleanup.1.txt larry.one.more.clinic.format.unit.map.cleanup.2.txt larry.one.more.clinic.format.unit.map.cleanup.3.txt larry.one.more.clinic.format.unit.map.cleanup.4.txt larry.one.more.clinic.format.unit.map.cleanup.5.txt

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/larryhastings'
closed_at = <Date 2015-05-08.06:31:50.870>
created_at = <Date 2015-04-19.01:00:33.801>
labels = ['type-bug']
title = 'More fixes for the Clinic mapping of converters to format units'
updated_at = <Date 2015-05-08.06:31:50.868>
user = 'https://github.com/larryhastings'

bugs.python.org fields:

activity = <Date 2015-05-08.06:31:50.868>
actor = 'larry'
assignee = 'larry'
closed = True
closed_date = <Date 2015-05-08.06:31:50.870>
closer = 'larry'
components = []
creation = <Date 2015-04-19.01:00:33.801>
creator = 'larry'
dependencies = []
files = ['39113', '39115', '39125', '39288', '39289']
hgrepos = []
issue_num = 24000
keywords = []
message_count = 24.0
messages = ['241468', '241490', '241503', '241509', '241522', '241528', '241529', '241536', '241538', '241558', '242562', '242563', '242567', '242568', '242571', '242572', '242573', '242574', '242575', '242578', '242581', '242582', '242748', '242749']
nosy_count = 4.0
nosy_names = ['larry', 'python-dev', 'zach.ware', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue24000'
versions = ['Python 3.5']

larryhastings · 2015-04-19T01:00:31Z

I found another bug in the mapping of converters to format units. (s#, z#, and y# all allow zeroes.)

I've redone the approach for str_converter in an attempt to make it easier to read.

It'd be swell if, after this gets checked in (or rejected), somebody *else* took a pass to see if they could find any bugs.

larryhastings · 2015-04-19T07:20:13Z

Oh, heavens, yes, that's much nicer. Thanks for the suggestion, Serhiy!

serhiy-storchaka · 2015-04-19T09:53:49Z

Doesn't always zeroes == length?

serhiy-storchaka · 2015-04-19T10:43:59Z

Converters with encoding=True are not used in converted code. "es" is never used in still non-converted code and "et" is used only 6 times (for "idna" and "utf-8" encodings) and can be replaced with custom converter or inlined code. So I think the support of encoding=True should be removed from Argument Clinic soon.

Converters with length=True are used very rarely too, and they are mostly legacy converters. I think we should get rid of most of them in future.

larryhastings · 2015-04-19T17:27:31Z

Doesn't always zeroes == length?

"zeroes" requires "length", but "length" does not require "zeroes". As it happens all the format units supported by str always have both parameters either True or False. But the Py_UNICODE converter accepts "length" for format units u# and Z#, and doesn't support "zeroes".

I want the converters to accept common parameters. That will make comprehension easier for the casual reader. So I want to keep "length" as a separate parameter, even for str.

So I think the support of encoding=True should be removed
from Argument Clinic soon. [...] I think we should get rid
of [length] in future.

I disagree. My goal with Argument Clinic is that third-party developers will use it to write extensions. And we don't know how many extension modules are using es, es#, et, et#, s#, y#, z#, u#, and U#. I don't think we can remove any of this functionality until 4.0.

serhiy-storchaka · 2015-04-19T18:18:30Z

> Doesn't always zeroes == length?

"zeroes" requires "length", but "length" does not require "zeroes". As it happens all the format units supported by str always have both parameters either True or False. But the Py_UNICODE converter accepts "length" for format units u# and Z#, and doesn't support "zeroes".

"length=True" implies "zeroes=True" in the Py_UNICODE converter. "u" and "Z" don't allow null characters, "u#" and "Z#" allow null characters. "zeroes" doesn't add any information and isn't needed.

serhiy-storchaka · 2015-04-19T18:21:23Z

I disagree. My goal with Argument Clinic is that third-party developers will use it to write extensions. And we don't know how many extension modules are using es, es#, et, et#, s#, y#, z#, u#, and U#. I don't think we can remove any of this functionality until 4.0.

But we can deprecate them sooner.

larryhastings · 2015-04-19T18:52:51Z

"u#" and "Z#" allow null characters.

Not according to the documentation. 'u' explicitly says it does not allow NUL characters. 'Z', 'u#', and 'Z#' all say they are "variants" of 'u' but never mention that they might allow NUL characters.

larryhastings · 2015-04-19T18:57:41Z

New diff based on Serhiy's latest round of comments. Thanks, Serhiy! You are inexhaustable!

serhiy-storchaka · 2015-04-19T19:39:30Z

Not according to the documentation. 'u' explicitly says it does not allow NUL characters. 'Z', 'u#', and 'Z#' all say they are "variants" of 'u' but never mention that they might allow NUL characters.

I understand the note in "u" description as explicitly saying that "u#" allows null characters.

The documentation for format units needs an update for other reasons and there is an issue with ready patch for this.

larryhastings · 2015-05-04T15:07:38Z

Here's a freshened version of the patch. I updated the Clinic HOWTO.

Serhiy: You're right, length and zeroes always have the same value. Would you ever want length without allowing zeroes? Like, in the future, would we ever want
str(length=True)
so we're passed in the length but we still don't want to allow zeroes?

serhiy-storchaka · 2015-05-04T15:26:30Z

I would say it is very unlikely. In any case, if we have a pointer and a length, we always can check for zeros after parsing.

May be rename the str converter to pchar and the Py_UNICODE converter to pwchar? Usually the converter is named by C type, not Python type. "y" and "y#" even don't accept str.

larryhastings · 2015-05-04T18:36:12Z

I have no interest in naming "str" to "pchar".

Yes, *most* of the other converters are named after the C type they translate to. But so far converter names doesn't mention whether or not they represent pointers to types--it's "object", not "pobject", it's "PyBytesObject", not "pPyBytesObject". That suggests the name "char" for the converter. But we've already got a "char", and it would be confusing to use that one converter for both chars (small ints / single characters) and strings.

Adding "p" in front is not a convention we've ever used. The word "pchar" does not appear in the Python source tree. So the name "pchar" has no mnemonic value.

If I were to follow your advice, I should prefer the name "char_star". But now we're using nine letters for what is almost certainly the most common converter. And, again, the generic converter for objects is called "object", I do not propose to rename it to "object_star". So this converter's name would be an exception to the rule.

But then again, C strings themselves are an exception to the rule. They're not a built-in type as much as they are a *convention*. So any name we give it will ultimately be something of a compromise. And as compromises go "str" is great. So far nobody has been confused by it. It's short, and universally, instantly clear as to its meaning.

Furthermore, converters don't actually represent a C type. They represent a *mapping*, from a Python type (or types) to a single C type. So while it's a useful and productive convention to name converters after the type they convert to, it's hardly mandatory. And it would be a shame to squander clarity in service to a needless consistency.

p.s. If we hold ourselves to this firm ideal, where every converter is named after its C type, what should we call the "bool" converter? What should we call the "self" converter? What should you call your proposed "boolint" converter?

larryhastings · 2015-05-04T18:46:25Z

Here's an updated patch where I've removed the "length" parameter to converters, instead relying solely on the "zeroes" parameter.

larryhastings · 2015-05-04T19:15:39Z

As for "str doesn't even accept str for y and y#", the name "str" is not for the Python type, it's for the C type.

serhiy-storchaka · 2015-05-04T19:28:01Z

Why not left the length parameter instead? First, current code uses "length". Second, the main effect from C side is that an argument is a pair (pointer, length), not just a pointer. Third, currently everywhere in Python documentation and error messages the used term is "null character/byte", so if left the zeros parameter, it should be renamed to allow_nulls or allow_nuls. Fourth, "y#" needs zeros=True for distinguish from "y", but "y*" allows nulls and has no the zeros parameter.

serhiy-storchaka · 2015-05-04T19:31:10Z

May be "string", or "data", or "buffer" would be better names? "str" looks as Python type.

larryhastings · 2015-05-04T19:37:08Z

I don't think those are better names.

larryhastings · 2015-05-04T19:44:31Z

As for "why not length instead of zeroes": Because the primary reason for the parameter is specifying that the string can contain embedded zeroes. Returning the length is a side-effect of this, not the main point. If the string didn't have embedded zeroes, we wouldn't need the length.

The only reason the code didn't have "zeroes=True" everywhere was because I screwed up and didn't realize all those mappings *should* have specified it.

The documentation is very consistent about calling it a NUL. I don't think "NUL=True" or "allow_NUL=true" is particularly attractive; we never (almost never?) use capital letters in parameter names. So any other name is going to be a compromise. "allow_null" and "allow_nul" are misspellings, and don't convey the idea any better; they can confuse the reader with the related concept of NULL or None. At least "zeroes" has the benefit of being an actual word, representing a related concept.

Will you be done bikeshedding soon?

serhiy-storchaka · 2015-05-04T20:16:53Z

In Python 2 and earlier in Python 3 there were no restrictions that arguments for format units "s", "y", "u", "z" and "Z" (matching Argument Clinic converters with length=False) shouldn't contain NUL. And this was considered as a bug.

As for NUL, ask Victor. It argued for naming it "null character/byte" and it was consistently changed almost everywhere (left some documentation, but this will be fixed soon).

larry.one.more.clinic.format.unit.map.cleanup.5.txt looks almost good to me, but I hesitate about the zeroes parameter. str_converter has the zeroes parameter, but the length attribute.

larryhastings · 2015-05-04T20:39:05Z

The length attribute is an internal implementation detail, so its name is not relevant. It's used in the generation of the accompanying "length" parameter for the impl function for this converter. "length" is a good name for it. "zeroes" is not a good name for it. "zeroes" is, however, a decent name for the converter parameter.

I have no quarrel with the documentation using the term NUL. But I don't think the term translates well to use parameter names.

larryhastings · 2015-05-04T20:41:20Z

I don't know why you're bringing up previous versions of Python. The clinic.py under review here is for 3.5.

python-dev · 2015-05-08T06:30:26Z

New changeset 36d5e04b6cfa by Larry Hastings in branch 'default':
Issue bpo-24000: Improved Argument Clinic's mapping of converters to legacy
https://hg.python.org/cpython/rev/36d5e04b6cfa

larryhastings · 2015-05-08T06:31:50Z

I think this is a definite improvement, so I've checked it in so I can move on. If you guys still want to talk about it, we can still change it before we hit beta.

"format units". Updated the documentation to match. GitHub-Issue-Link: python/cpython#68188

larryhastings self-assigned this Apr 19, 2015

larryhastings added the type-bug An unexpected behavior, bug, or error label Apr 19, 2015

larryhastings closed this as completed May 8, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

AA-Turner pushed a commit to AA-Turner/devguide that referenced this issue Sep 13, 2023

Issue #24000: Improved Argument Clinic's mapping of converters to legacy

e3c5656

"format units". Updated the documentation to match. GitHub-Issue-Link: python/cpython#68188

erlend-aasland pushed a commit to python/devguide that referenced this issue Sep 26, 2023

Issue #24000: Improved Argument Clinic's mapping of converters to legacy

93bb96d

"format units". Updated the documentation to match. GitHub-Issue-Link: python/cpython#68188

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More fixes for the Clinic mapping of converters to format units #68188

More fixes for the Clinic mapping of converters to format units #68188

larryhastings commented Apr 19, 2015

larryhastings commented Apr 19, 2015

larryhastings commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

larryhastings commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

larryhastings commented Apr 19, 2015

larryhastings commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

larryhastings commented May 4, 2015

serhiy-storchaka commented May 4, 2015

larryhastings commented May 4, 2015

larryhastings commented May 4, 2015

larryhastings commented May 4, 2015

serhiy-storchaka commented May 4, 2015

serhiy-storchaka commented May 4, 2015

larryhastings commented May 4, 2015

larryhastings commented May 4, 2015

serhiy-storchaka commented May 4, 2015

larryhastings commented May 4, 2015

larryhastings commented May 4, 2015

python-dev mannequin commented May 8, 2015

larryhastings commented May 8, 2015

More fixes for the Clinic mapping of converters to format units #68188

More fixes for the Clinic mapping of converters to format units #68188

Comments

larryhastings commented Apr 19, 2015

larryhastings commented Apr 19, 2015

larryhastings commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

larryhastings commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

larryhastings commented Apr 19, 2015

larryhastings commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

larryhastings commented May 4, 2015

serhiy-storchaka commented May 4, 2015

larryhastings commented May 4, 2015

larryhastings commented May 4, 2015

larryhastings commented May 4, 2015

serhiy-storchaka commented May 4, 2015

serhiy-storchaka commented May 4, 2015

larryhastings commented May 4, 2015

larryhastings commented May 4, 2015

serhiy-storchaka commented May 4, 2015

larryhastings commented May 4, 2015

larryhastings commented May 4, 2015

python-dev mannequin commented May 8, 2015

larryhastings commented May 8, 2015