Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 657: use 0-indexed offsets #2022

Merged
merged 3 commits into from
Jul 4, 2021
Merged

Conversation

isidentical
Copy link
Member

For the background of this change, see the discussion on: python/cpython#26958 (comment)

@isidentical
Copy link
Member Author

CC: @ammaraskar

pep-0657.rst Outdated
while offsets bigger than these values will be treated as missing (value of 0).
all AST nodes. The output of the public APIs (``co_positions`` and ``PyCode_Addr2Location``)
that deal with these attributes use 0-indexed offsets (just like the AST nodes) and designates
``-1`` as the return code for information not available, but the underlying implementation is
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also more consistent with the PyCode_Addr2Line, since it also returns -1 on error but for the sake of consistency we used to convert -1 to 0 on PyCode_Addr2Location. Now that cast won't be needed.

pep-0657.rst Outdated Show resolved Hide resolved
pep-0657.rst Outdated Show resolved Hide resolved
Copy link
Member

@ammaraskar ammaraskar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with Pablo's suggestion incorporated.

@isidentical isidentical requested a review from warsaw July 4, 2021 17:57
pep-0657.rst Outdated Show resolved Hide resolved
@pablogsal pablogsal merged commit 5fc4119 into python:master Jul 4, 2021
@terryjreedy
Copy link
Member

The merged version looks good to me.

@gvanrossum
Copy link
Member

Whoa, don’t we use 1-based offsets in SyntaxError attributes?

@ammaraskar
Copy link
Member

There's a bit of discussion around that here: python/cpython#26958 (comment)

I think the consensus ended up being that SyntaxError is the odd one out here and we'd rather simplify the places we have the 1 to 0 conversion logic and stay consistent with the ast.

@gvanrossum
Copy link
Member

Hm, possibly, but we chose 1-based because editors like vim and emacs interpret column numbers as 1-based. Where else do we use 0-based offsets te refer to source locations? (In Python code. Of course in C code it’s 0-based all the way down. :-)

@pablogsal
Copy link
Member

pablogsal commented Jul 5, 2021

Where else do we use 0-based offsets te refer to source locations?

In the AST, all offsets are 0 based and that's one of the primary motivations as we want to be as close as those as possible:

>>> import ast
>>> ast.dump(ast.parse("x"), include_attributes=True)
"Module(body=[Expr(value=Name(id='x', ctx=Load(), 
    lineno=1, col_offset=0, end_lineno=1, end_col_offset=1
), 
    lineno=1, col_offset=0, end_lineno=1, end_col_offset=1
)], type_ignores=[])"

so one of the advantages of 0-based is that every tool that currently deals with these can be used without modifications to deal with the new positions. We asked around some authors and seems that everyone frefers 0-based positions (so they can also do line[offset:end_offset] directly.

@terryjreedy
Copy link
Member

I not sure exact what "In Python code." is meant to say, but both Python and tk (and hence tkinter and IDLE) are 0-based for both indexing and slicing.

@gvanrossum
Copy link
Member

What I meant is when you print column offsets for end-user tools like editors to use, 1-based seems to be the norm. OTOH when you are using this to index into an array of lines, 0-based is more convenient and seems conventional. (Everybody agrees that line numbers start at 1 though. :-)

I'm not yet sure whether PEP 657 is more faced towards end users and their editors, or towards developers of Python libraries that will need to slice and dice the source code.

I do observe that in traceback.py the TraecbackException object has a documented offset field that's 1-based (since it is copied from SyntaxError).

I guess there will be a few traps no matter what we decide...

@isidentical isidentical deleted the pep-657-indexing branch July 5, 2021 16:15
@pablogsal
Copy link
Member

What I meant is when you print column offsets for end-user tools like editors to use, 1-based seems to be the norm. OTOH when you are using this to index into an array of lines, 0-based is more convenient and seems conventional. (Everybody agrees that line numbers start at 1 though. :-)

Oh, i see what you mean. Notice that we are not showing the numbers at the moment (and we don't plan to) so users will not see "error in column x". These numbers are mainly focused for tools to use them to grab the correct piece of source code that is associated with an instruction. If we ever show these numbers, we can easily make them 1-based only for the display.

I guess there will be a few traps no matter what we decide...

Right, after asking around seems that there is an opportunity to avoid a slightly bigger subset of tracks for libraries consuming these numbers. So this decision would prioritize reusing existing code and may give some small surprises when developing new code. I think is a good balance but I totally understand if someone sees the other way :)

@gvanrossum
Copy link
Member

Okay, sounds good!

@terryjreedy
Copy link
Member

I see the essence of PEP 657 as making the full AST position info available on the code object as iterable co_positions. bpo-43950 translates the position info, within the existing traceback.py traceback construction method, into a new caret line added under the code line.

bpo-44569, just opened, will factor the per-frame formatting code into a separate method that users can replace. Users can then handle the position info as they wish while customizing frame formatting, all without giving up other traceback features (like truncating tracebacks resulting from runaway recursion). IDLE, for instance. will omit the caret line and instead include the position info so that Shell can highlight the error slice within the code line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants