-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-88116: Use a compact format to represent end line and column offsets. #91666
Conversation
Looks like this results in a 6% memory improvement over
|
else { | ||
int ldelta = get_line_delta(&data[offset]); | ||
*output++ = 0xe8 | blength; | ||
output += write_signed_varint(output, ldelta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reimplementing write_location_info_no_column from compile.c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably get an even tighter representation for the ! code_debug_ranges case (which is presumably the perf critical one). Here we are only using two of the codes, and using a whole byte 4 bits to specify which one. Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although there are only two codes, there is the implicit bit of information about whether code_debug_ranges
is set.
So there are 6 bits of information in the byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this extra info in each entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which extra info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether code_debug_ranges is set. It's the same across all entries, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If that information is not stored in the entry, then we effectively have two distinct formats.
Which means two different sets of code for creating and parsing the tables.
I doubt that it is worth it.
I think we need a similar comparison with debug_ranges turned off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think https://github.com/python/cpython/pull/91666/files#r854367808 can be addressed now (replace the 15 by the enum), but the other comments we can look at later.
It would be good in the future to have all the logic about the structure of this table in one place. Currently it's split between compile.c, codeobject.c and the tests. We can look at this (and the exceptions table) as part of the compiler work for 3.12.
We can also think later about an even-more optimised representation for line-numbers-only if we want.
Is the documentation up to date? https://docs.python.org/dev/c-api/code.html#c.PyCode_New Should it be documented in What's New in Python 3.11? |
The documentation is correct again, now I've removed the two extra tables. It wasn't before. |
No, it is wrong as it is missing the exception table. |
The Python 3.10 doc looks correct, no? If not, it should be fixed ;-) |
@markshannon There is something wrong with commit 944fffe and the new encoding format. Check out #93662 |
Reduces the memory consumption for extended location information to less than half.
Location information now takes ~3.5 bytes per instruction, instead of 8. (Note this is per instruction, not code unit).
All the information is saved in one table, instead of three.
#88116