Replies: 4 comments 1 reply
-
What does |
Beta Was this translation helpful? Give feedback.
-
Ah, I see. |
Beta Was this translation helpful? Give feedback.
-
After offline discussions and some comments made here, I initially thought that it would be better to close this issue down. I'm not doing this, because I still want to continue investigating possible optimizations for small strings -- maybe just in a different way. The main advantages of this specific optimization is to:
Since the buffer follows the object header in reality, this specific optimization isn't in fact as useful as I initially thought it would be, as-is at least, especially now with the work I was unaware of to remove the This is probably possible to do today, and it has some benefits, and is ultimately something I'd like to work on. However, I have another idea that might be useful for all short-lived objects, strings or not. But, for this, I need to put on my mad scientist lab coat; hold on a minute. OK, so, there's a fifth point I haven't listed above: in languages, such as C++, where objects can be constructed on the stack, small string objects can avoid the malloc roundtrip altogether all the while maintaining the same API surface. Even with good memory allocators that maintain thread-local arenas to avoid locking, it's very unlikely that a dynamically memory allocator will ever beat something that's just "subtract a value from a register and you're done". This doesn't really apply to Python as-is, of course, but here's a thought that needs some maturing: track the allocation size of all short-lived objects in non-quickened code objects, and, on quickened code objects, use this information to build an arena inside a call frame, and freely carve space from there -- without even taking the GIL for all the short-lived allocations/deallocations. (Going one step further, one could even elide all the deallocation overhead if we know that, after a particular instruction, no more allocations from this arena will be performed. But we're in diminishing returns territory at this point.) I imagine this would pair really well with the small string optimization suggested in this discussion (point #4 specifically). Of course, the allocation size tracker is kinda hand-wavy at the moment -- I still need to leave this percolating for a while -- but I wanted to open this idea for discussion. |
Beta Was this translation helpful? Give feedback.
-
One trick that I've seen elsewhere and seems to work really well is to store a string in some struct similar to this one:
This would mean that, on 32-bit architectures, strings up to 7 bytes would be stored inlined in the string type; on 64-bit architectures, strings up to 15 bytes would be stored in the same manner.
The cleverness about this is scheme is that, instead of considering the second field as the actual length, we consider it as
remaining_length
: when that's zero, this doubles as the terminating NUL, making it possible to pass the string to the C standard library functions that expect strings like that. The buffer length ends up being some expression like:flags & IS_SMALL ? sizeof(small.buffer) - small.remaining_length : large.length
.We already have a lot of space for flags in the unicode string object; maybe this is something we should evaluate too? Especially for ASCII characters, used for identifiers and such, this should help us yield quite a bit of memory savings and reduce a lot of pointer chasing.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions