-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce cost of resetting row attributes #15497
Conversation
cf9356d
to
4d6a2ba
Compare
// This is a very unsafe shortcut to free the buffer and get a direct | ||
// hold to the _buffer. The caller can then fill it with `size` items. | ||
[[nodiscard]] T* unsafe_shrink_to_size(size_t size) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on why this is unsafe and why it is okay for us to do it anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent question. I would love to have it in a comment.
My best guess is, "it returns a raw pointer to the internal storage of the small_vector"... which literally lets us poke at its innards.
Leonard, quick question: why is it faster for us to do this (poke at the innards) than to have an operator=
or a .resize_to_one_and_reset()
or something on rle
?
Does this introduce a primitive that is used in future PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading from uninitialized memory with automatic storage duration (= allocated on the stack) is technically undefined behavior, even for trivial structs and arrays. Trivial structs are basically structs that consist entirely of primitive types like char
or int
. The pointer this function returns points into the small-vector's internal buffer that contains exactly such uninitialized memory and so the caller has to technically make sure to not read from it, or otherwise the CPU might trap / throw an exception, etc. (This doesn't really apply to any modern CPU architecture.)
More importantly however, since the array is not initialized, the returned pointer cannot be used for non-trivial structs. For instance, imagine I have this code:
// A vector with space for 4 items on the stack, without the need for heap allocations.
til::small_vector<std::string, 4> vec;
// Make room for 1 item on the stack.
const auto ptr = vec.unsafe_shrink_to_size(1);
// Initialize that first item in the vector.
*ptr = std::string{ "foobar" };
This would fail in terrible ways, because the backing buffer on the stack is uninitialized. Its memory may contain any potential garbage, leftover data. When you then assign a std::string
to it, the string's operator=()
will try to free any potential memory the target (*ptr
) may have had, in order to assign the new string contents. When it then reads the uninitialized/garbage values, it'll think that the target / *ptr
had something allocated and try to free it. But that's wrong. It didn't have anything.
For cases like this the right choice is usually to use std::uninitialized_default_construct_n
, which would initialize the memory and ensure that the returned pointer always points to valid data, in case of non-trivial structs. And even better, for trivial structs uninitialized_default_construct_n
will leave the memory uninitialized, which is great for performance. (If this behavior is surprising, you can compare it with how in C/C++ you need to always write char c = 0;
and int x = 123;
, because int x;
by itself will leave x
with an undefined value. If this behavior is unwanted, the alternative is std::uninitialized_value_construct_n
.)
But there's a problem... TextAttributes
is not a trivial struct. Any struct that has a constructor - or even worse, destructor - cannot be trivial anymore. After all, if it has a constructor it surely wants to be initialized to some value. For instance, this struct:
struct Foo {
int x = 0;
};
is not trivial anymore, because the = 0
forces the addition of an implicit constructor, which initializes the value of x
during construction to 0. Now you can safely write Foo foo;
in some function and rest assured that reading foo.x
will always return some initialized value. But while this is great for the general case, "97%" of all code basically, this is not great for the "3%" case, the code that's the hot path. Because then this added initialization does have a noticeable impact, if the compiler fails to optimize it away. (And in this case, it does.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leonard, quick question: why is it faster for us to do this (poke at the innards) than to have an
operator=
or a.resize_to_one_and_reset()
or something onrle
?
resize_to_one_and_reset
would only work for one specific use case: Writing one value. A operator=
would have the same limitation unless we have it accept a std::span
or similar, but a std::span
is not a trivial struct and thus not cost free (especially not on the Windows x64 ABI and especially-especially not with MSVC's still poor ability to remove struct parameter copies after inlining a call). The unsafe_shrink_to_size
function on the other hand works with arbitrary size
parameter values and can be fully inlined into the caller / optimized away.
Basically, IMO we should either keep code simple or fast. So, if an optimization can't be simple, it better be fast. I'm basically min-maxing. 😄
_attr = { _columnCount, attr }; | ||
// Constructing and then moving objects into place isn't free. | ||
// Modifying the existing object is _much_ faster. | ||
*_attr.runs().unsafe_shrink_to_size(1) = til::rle_pair{ attr, _columnCount }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clever
// This is a very unsafe shortcut to free the buffer and get a direct | ||
// hold to the _buffer. The caller can then fill it with `size` items. | ||
[[nodiscard]] T* unsafe_shrink_to_size(size_t size) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent question. I would love to have it in a comment.
My best guess is, "it returns a raw pointer to the internal storage of the small_vector"... which literally lets us poke at its innards.
Leonard, quick question: why is it faster for us to do this (poke at the innards) than to have an operator=
or a .resize_to_one_and_reset()
or something on rle
?
Does this introduce a primitive that is used in future PRs?
@lhecker if you want to update the numbers before merging, I'm cool with. otherwise, leave em! :) |
@carlos-zamora Could you give this another ✅? |
Performance of printing enwik8.txt at the following block sizes:
4KiB (printf): 54MB/s -> 54MB/s
128KiB (cat): 101MB/s -> 104MB/s
Validation Steps Performed
This change is easily verifiable via review.