Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make big_string pod #1204

Merged
merged 1 commit into from
Jun 8, 2024
Merged

Make big_string pod #1204

merged 1 commit into from
Jun 8, 2024

Conversation

alkis
Copy link
Contributor

@alkis alkis commented Jun 8, 2024

big_string is not pod which means it needs to be properly constructed and destroyed. Instead make it POD and destroy it manually in string destructor.

alkis referenced this pull request Jun 8, 2024
A small-string optimization is a way of reusing inline storage space for
sufficiently small strings, rather than allocating them on the heap. The
current approach takes after an old Facebook string class: it reuses the
highest-order byte for flags and small-string size, in such a way that a
maximally-sized small string will have its last byte zeroed, making it a
null terminator for the C string.

The only flag we have is in the highest-order bit, that says whether the
string is big (set) or small (cleared.) Most of the logic switches based
on the value of this bit; e.g. data() returns big()->p if it's set, else
small()->buf if it's cleared. For a small string, the capacity is always
fixed at sizeof(string) - 1 bytes; we store the length in the last byte,
but we store it as the number of remaining bytes of capacity, so that at
max size, the last byte will read zero and serve as our null terminator.

Morally speaking, our class's storage is a union over two POD C structs.
For now I gravitated towards a slightly more obtuse approach: the string
class itself contains a blob of the right size, and we alias that blob's
pointer for the two structs, taking some care not to run afoul of object
lifetime rules in C++. If anyone wants to improve on this, contributions
are welcome.

This commit also introduces the `ctl::__` namespace. It can't be legally
spelled by library users, and serves as our version of boost's "detail".

We introduced a string::swap function, and we now use that in operator=.
operator= now takes its argument by value, so we never need to check for
the case where the pointers are equal and can just swap the entire store
of the argument with our own, leaving the C++ destructor to free our old
storage afterwards.

There are probably still a few places where our capacity is slightly off
and we grow too fast, although there don't appear to be any where we are
too slow. I will leave these to be fixed in future changes.
Copy link
Collaborator

@mrdomino mrdomino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks!

ctl/string.h Outdated Show resolved Hide resolved
`big_string` is not pod which means it needs to be properly
constructed and destroyed. Instead make it POD and destroy it
manually in `string` destructor.
@mrdomino mrdomino merged commit a0410f0 into jart:master Jun 8, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants