Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(decode): optimize Vec alloc/resize #179

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AaronO
Copy link

@AaronO AaronO commented Mar 6, 2022

Avoid calloc overhead of initializing buffer we'll write into, improving decode_small_input/decode/3 by -33%

Unfortunately requires unsafe Vec::set_len so we can get a mutable ref to the uninit portion of the Vec's buffer

Before

decode_small_input/decode/3
                        time:   [56.722 ns 56.776 ns 56.833 ns]
                        thrpt:  [50.341 MiB/s 50.391 MiB/s 50.439 MiB/s]
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) high mild
  16 (16.00%) high severe

After

decode_small_input/decode/3
                        time:   [37.232 ns 38.653 ns 41.313 ns]
                        thrpt:  [69.252 MiB/s 74.017 MiB/s 76.843 MiB/s]
                 change:
                        time:   [-34.739% -33.757% -32.200%] (p = 0.00 < 0.05)
                        thrpt:  [+47.492% +50.959% +53.231%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

Avoid calloc overhead of initializing buffer we'll write into

Unfortunately requires unsafe Vec::set_len so we can get a mutable ref to the uninit portion of the Vec's buffer
@AaronO
Copy link
Author

AaronO commented Mar 6, 2022

FYI this also fixes a "bug" in decode_engine where we over alloc the buffer's size:

let mut buffer = Vec::<u8>::with_capacity(input.as_ref().len() * 4 / 3);

This approximately allocates with a ratio of 4/3 instead of 3/4 (without accounting for padding, etc...)

Comment on lines +162 to +165
buffer.reserve(len_estimate);
unsafe {
buffer.set_len(total_len_estimate);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety documentation of set_len is pretty clear that this is UB. Namely it states

The elements at old_len..new_len must be initialized.

But total_len_estimate = starting_output_len + len_estimate, which means the last len_estimate elements are not properly initialized.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasbb It's not UB or unsafe in aggregate since write into that "unintialized" portion of the buffer and then truncate to the true length.

This avoids the overhead of callocing/initializing a buffer that we'll immediately overwrite

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions that as a safety requirement, and this code is not upholding it. Later, the code also creates a reference to this uninitialized memory, which is also not allowed.
Some functions like decode could also panic and unwind before truncate could set the correct length. Now, the drop implementation of Vec might potentially read the uninitialized values.

If you really need uninitialized memory you need to write to it via a pointer and update the length only after writing. Alternatively, using MaybeUninit is an option. If the buffer would be of type Vec<MaybeUninit<u8>>, then I think the set_len would be ok, but it isn't.

@marshallpierce
Copy link
Owner

Thanks for this work. However, I'm not sure who the target user would be -- if you're ok with unsafe, the forthcoming AVX2 version will surely outperform it, and if you're not, you'd probably want the existing safe version. 🤔 Good catch on the over-allocation, too!

@Nugine
Copy link

Nugine commented Jun 25, 2022

Hi. There is another crate base64-simd for performance. It is highly unsafe but much faster than base64 crate.

@marshallpierce
Copy link
Owner

I have some in progress work to address #182, after which #170 will be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants