-
-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctl::string small-string optimization #1199
Conversation
A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string. The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. Morally speaking, our class's storage is a union over two POD C structs. It may be that this winds up being the best way to actually write it but for now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. Only in writing this now do I realize that we may be able to relatively easily sidestep those rules. TODO: - [ ] tests are currently segfaulting - [ ] think about operator string_view - [ ] maybe migrate to POD anonymous union - [ ] benchmark and see if this is even worth it - [ ] __ namespace needs documented, at least here - [ ] we are probably incorrectly setting size in a few places - [ ] explain why assign-by-value and "swapperator", at least here
private: | ||
inline bool isbig() const noexcept | ||
{ | ||
return *(__builtin_launder(blob) + __::sso_max) & 0x80; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice is there one of these for std::move and std::forward? Also the last time I got truly autistic about C++ was back in 2012. Could you give me a two sentence tutorial on what this launder does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move and forward are both trivial plain C++ functions; there is no reason we couldn't write a brief ctl::move
and ctl::forward
of our own.
E.g. capnproto's libkj does this:
std::launder
is different. It is an escape hatch from the compiler's ability to make crazy assumptions based on the object lifetime rules of C++ - I have not carefully studied this in a bit but my understanding is, for example, if you have a byte that is part of a union where one arm is const
and the other isn't, and you change it in the non-const
arm, then unless you call launder
when referencing the const
arm, the compiler can just assume that the value did not change and not bother to look.
Since you obviously can't implement this in terms of the actual C++ language itself, there is going to be a __builtin_launder
or something that your STL provider is going to collaborate with your compiler author to use.
Basically the smooth-brained rule to learn is "if you're in a situation where you're using the same backing memory for two different C++ objects, then you need to use launder on your pointers so the compiler doesn't do something crazy."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. This information has much more clarity than Google results.
I'm liking this so far. I'm glad to see Facebook's alpha coming to the Cosmopolitan codebase. Since we're replacing simple code with intelligent code, I'd like to see a benchmark too that demonstrates the advantage. Please don't use my legacy benchmark macros. Something simple and less frameworky like this should do: #define BENCH(ITERATIONS, WORK_PER_RUN, CODE) \
do { \
struct timespec start = timespec_real(); \
for (int i = 0; i < ITERATIONS; ++i) { \
asm volatile("" ::: "memory"); \
CODE; \
} \
long long work = WORK_PER_RUN * ITERATIONS; \
double nanos = (timespec_tonanos(timespec_sub(timespec_real(), start)) + work - 1) / (double)work; \
printf("%10g ns %2dx %s\n", nanos, ITERATIONS, #CODE); \
} while (0) |
Clearly we aren't exercising the capacity increase logic very hard yet.
It clamps to size() + 1, not just size(), i.e. we reserve the null byte. We also check the exact requested capacity against the sso_max before we do the (??) alignment (??) stuff.
master:
ctl-sso:
|
Had not pushed this from my local branch. :(
These commits were sitting on a local branch that I neglected to push before merging. :( * Use memcpy for string::reserve * Remove fence comments
Making a string_view from a string appears to take about 1.3ns no matter what. 100% definitely no point deviating from the STL API over that.
A small-string optimization is a way of reusing inline storage space for sufficiently small strings, rather than allocating them on the heap. The current approach takes after an old Facebook string class: it reuses the highest-order byte for flags and small-string size, in such a way that a maximally-sized small string will have its last byte zeroed, making it a null terminator for the C string.
The only flag we have is in the highest-order bit, that says whether the string is big (set) or small (cleared.) Most of the logic switches based on the value of this bit; e.g. data() returns big()->p if it's set, else small()->buf if it's cleared. For a small string, the capacity is always fixed at sizeof(string) - 1 bytes; we store the length in the last byte, but we store it as the number of remaining bytes of capacity, so that at max size, the last byte will read zero and serve as our null terminator.
Morally speaking, our class's storage is a union over two POD C structs. For now I gravitated towards a slightly more obtuse approach: the string class itself contains a blob of the right size, and we alias that blob's pointer for the two structs, taking some care not to run afoul of object lifetime rules in C++. If anyone wants to improve on this, contributions are welcome.
This commit also introduces the
ctl::__
namespace. It can't be legally spelled by library users, and serves as our version of boost's "detail".We introduced a string::swap function, and we now use that in operator=. operator= now takes its argument by value, so we never need to check for the case where the pointers are equal and can just swap the entire store of the argument with our own, leaving the C++ destructor to free our old storage afterwards.
There are probably still a few places where our capacity is slightly off and we grow too fast, although there don't appear to be any where we are too slow. I will leave these to be fixed in future changes.
TODO:
maybe migrate to POD anonymous union(i like this way better)