Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: Reallocate() instead of creating a new backing store #53552

Closed
wants to merge 1 commit into from

Conversation

mildsunrise
Copy link
Member

in Buffer::New(..., node::encoding), when the prediction returned by StringBytes::Size() doesn't match the actual size emitted by StringBytes::Write(), we currently create a second backing store and then memcpy() to it.

AFAIU, we can instead reallocate the backing store into the new size, which should contract the allocation if possible.

this is especially relevant for base64 (see discussion at #53550), which since #52428 overestimates fairly often.

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Jun 22, 2024
@mildsunrise mildsunrise force-pushed the buffer-new-reallocate branch from a647a40 to 3c2c597 Compare June 22, 2024 22:43
@mildsunrise mildsunrise added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 22, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 22, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@mildsunrise
Copy link
Member Author

this seems to improve performance substantially when decoding large (order of MBs or more) inputs, but not that much in smaller inputs. but even then, it simplifies the code so I think it's valuable

@mildsunrise mildsunrise marked this pull request as ready for review June 22, 2024 23:33
@anonrig anonrig added needs-benchmark-ci PR that need a benchmark CI run. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Jun 22, 2024
@mildsunrise mildsunrise requested review from addaleax and targos June 23, 2024 00:17
@lemire
Copy link
Member

lemire commented Jun 23, 2024

I thought that's what we had, and then @targos deliberately reverted it.

#52292

Note that we should be concerned by the fact that if we are even just 1 byte off, we create a new buffer and copy. Copies are cheap and that's never going a massive issue, but we are potentially wasting time for no good reason. I think that @joyeecheung raised this issue in a comment to #52292

@mildsunrise
Copy link
Member Author

ah, dang, I did not realize Reallocate() was deprecated... sad. that'll teach me to always look at the history before trying stuff :)

@mildsunrise
Copy link
Member Author

there is something I don't understand, though. #52234 claims there is no longer a performance benefit since we are no longer overriding the allocator's reallocation... but the performance increase I observed (in main) was too large to have been a fluke 🤔

@joyeecheung
Copy link
Member

joyeecheung commented Jun 23, 2024

Just a guess: the performance difference may be caused by the fact that when allocating a new backing store, V8 always zero-initializes the buffer even though its content (or most of it) will be overwritten by the copy later; whereas the realloc implementation in V8 only allocates an uninitialized buffer first, copy the data and if the new length is bigger, zero-initialize the uncopied tail.

@joyeecheung
Copy link
Member

Also in a micro-benchmark scenario: repeatedly allocating a new backing store can urge the GC a lot more eagerly than simply reallocating it (the latter isn’t incorporated into the GC schedule as much AFAICT), though technically that would be a pitfall of the microbenchmark itself.

@mildsunrise
Copy link
Member Author

Just a guess: the performance difference may be caused by the fact that when allocating a new backing store, V8 always zero-initializes the buffer even though its content (or most of it) will be overwritten by the copy later; whereas the realloc implementation in V8 only allocates an uninitialized buffer first, copy the data and if the new length is bigger, zero-initialize the uncopied tail.

yeah, I thought about that memset but it doesn't seem to be supported by profiling...

Also in a micro-benchmark scenario: repeatedly allocating a new backing store can urge the GC a lot more eagerly than simply reallocating it (the latter isn’t incorporated into the GC schedule as much AFAICT), though technically that would be a pitfall of the microbenchmark itself.

so you're saying that even if the reallocation is implemented as allocate + copy, it is still treated differently in terms of GC?

@joyeecheung
Copy link
Member

joyeecheung commented Jun 23, 2024

Yes because the GC scheduling code is in V8. A Realloc call from Node.js’s side doesn’t affect GC much AFAICT, whereas NewBackingStore does.

My comment in the other PR was talking about certain use cases where we don’t need the buffer to be allocated through the array buffer allocator and they can just be allocated normally. IIUC we sometimes prefer to allocate native memory using the array buffer allocator so that when the amount of external memory used is high enough V8 would perform GC to hopefully reduce the memory footprint (and if those external memory are set up to be released when some JS values get GC’ed, it would do the job). However this setup is not always doing more good than harm. Performance wise it may not be bad to just directly manage the external using stdlib calls and use Isolate::AdjustAmountOfExternalAllocatedMemory() to tune GC instead of going through the array buffer allocator machinery. However I don’t think this applies to Buffers because they are Uint8Arrays now and are always backed by real array buffers anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-benchmark-ci PR that need a benchmark CI run. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants