-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer.from(str)
memory leak
#38300
Comments
rss seems stabilized around 5G in my machine (MacOS).
|
Macos 11 nodejs 14:
nodejs 12:
|
I experience this issue running Node v15.14.0 on Ubuntu 20.04 (LTS) x64. Basically, I have a long-running server process that runs out of memory and crashes periodically. It makes frequent Buffer.from() method calls. The problem started after upgrading to v15.14.0. |
It seems the issue did not happen in v14.0.0. I would do bisect to look which change is suspicious. v14.0.0
|
|
I just tested it, node.js v15.11.0 / Windows 10 / x64 leaks memory on my computer with the supplied code. |
I'm seeing a relatively large change between v14.4 and 14.5 |
I can confirm, it was broken with v14.5.0. |
Try this code? const str = 'x'.repeat(10000);
console.log('start', process.memoryUsage());
function wait(ms) {
let resolve;
const promise = new Promise((_resolve, _reject) => { resolve = _resolve; });
setTimeout(() => {
resolve();
}, ms);
return promise;
}
(async () => {
for(let i = 0; i < 1e6; i++) {
Buffer.from(str); // memory leak happens here
if (i % 1e5 === 0) {
await wait(100);
gc();
console.log('step', i, 'rss', process.memoryUsage().rss);
}
}
})().then(() => {
gc();
console.log('end', process.memoryUsage());
}); |
v14.16.0 XadillaX's code: start {
rss: 20185088,
heapTotal: 4206592,
heapUsed: 2555408,
external: 828505,
arrayBuffers: 9898
}
step 0 rss 21876736
step 100000 rss 251977728
step 200000 rss 279126016
step 300000 rss 287125504
step 400000 rss 297897984
step 500000 rss 304611328
step 600000 rss 257728512
step 700000 rss 280563712
step 800000 rss 270651392
step 900000 rss 264871936
end {
rss: 1112817664,
heapTotal: 7667712,
heapUsed: 2237200,
external: 33190872,
arrayBuffers: 9898
} |
Yeah, I does not happen if the control goes back to the event loop. But still, it's not a solution, given that a lot of libraries / code rely on synchronous processing of buffers. It would be a serious disadvantage of using Buffers compared to Uint8Arrays. |
I think it's because of this ee6ec14. This commit was using Using ee6ec14:
Using it's parent 39f42d1:
And I found the parent commit is much faster than the refactored one. Shall we change it back to libuv's |
It looks like that the length of the string is also important. |
bind to: #33291 |
bind to: #33381 |
From my tests + experience, v15.12.x still runs stable (no crashes), but newer releases than that crash periodically (mostly SIGV) - so not sure this is related to this particular issue. |
It seems that memory regression is introduced after #33291, (e2cd715 specifically) on master branch, as @XadillaX mentioned. Resultits previous commit
|
I'll have a look tomorrow daytime. |
I don't believe that #33291 and e2cd715 are the issue. Running the test case here, I don't see
That's because the code path that introduces the leak is not invoked for sizes < 4096. This further confirms that it's likely c1ee70e that's the culprit here.
|
Actually, scratch that. The issue is the the for loop here... const str = 'x'.repeat(10000);
console.log('start', process.memoryUsage());
for(let i = 0; i < 1e6; i++) {
Buffer.from(str); // memory leak happens here
if (i % 1e5 === 0) {
gc();
console.log('step', i, 'rss', process.memoryUsage().rss);
}
}
gc();
console.log('end', process.memoryUsage()); The Buffers are being allocated in a synchronous for loop. For every Bottom line is... there's not actually a problem here from Node.js' point of view. Everything is working as it should. The challenge is that the memory management here is not very intuitive. The |
I also dug into it. Lines 184 to 199 in 053aa6d
OnBackingStoreFree() is being called synchronously, but there is a |
@Daninet Unfortunately, the existing native API contract (i.e. towards addons and embedders) for Buffers is that their callback is a) called on the original thread where the Buffer was created, and b) that it is generally possible to run JS code in that callback. That requires some kind of asynchronous solution, at least in the general case. We could also change the API contract in a semver-major change, but it’s hard to communicate the changed API to addons (which are often unmaintained, esp. those not using N-API, which is what this is about). I think for this particular issue, we could avoid using |
Yeah I was thinking the same, although I'm not entirely convinced it would make much difference. We'd free the memory up a bit sooner, sure, but the test case is pretty pathological anyway. I'm not convinced there's a lot of justification to optimize for it. |
It's imaginable that some people run into this issue while doing benchmarks or data science from fully synchronous JS. Also it can cause memory spikes leading to OOM crash on node.js backends running low on RAM. But I see, it's tricky to fix... Wouldn't make sense to add some words to the documentation about this async GC behavior, if it's not being fixed? |
Documentation would be good, for sure. It's tricky tho because it's a fairly complicated issue that doesn't apply to the majority of situations. Will give some thought on how to document that and always open to suggestions! :-) ... on whether we change this is still up in the air. @addaleax's suggestion should work and should be fairly simple -- and I certainly would not block it! I'm just thinking that there's better ways of addressing that problem. For instance, it would be conceivable to allocate a large slab of memory at the start of the benchmark or data analysis and simply slice off from it as needed -- effectively doing your own memory management rather than repeatedly allocating and freeing and relying on gc behavior in a tight loop like this -- it would be much more efficient at the cost of some increased code complexity. |
If somebody is aware of this issue, then surely it can be avoided and there are a lot of ways for it: more efficient algorithms, TextEncoder or even doing async sleeps to enable GC-ing the buffers. The problematic part relies in the unexpected and undocumented behavior, which does not occur with |
Signed-off-by: James M Snell <jasnell@gmail.com> Refs: nodejs#38300
Previously, the code path would allocated a tracked ArrayBuffer that defers cleanup and deallocation of the underlying data with a SetImmediate. Avoid the unnecessary deferral by just allocating a `BackingStore` directly and writing into it. Fixes: nodejs#38300 Refs: nodejs#38336
Ok, I have two possible approaches and two PRs here. The first is just a documentation fix in #38336 The second in #38337 follows @addaleax's suggestion to avoid the
|
After testing: const str = 'x'.repeat(10000);
console.log('start', process.memoryUsage());
console.time('1');
for(let i = 0; i < 1e6; i++) {
Buffer.from(str); // memory leak happens here
if (i % 1e5 === 0) {
// await wait(100);
gc();
console.log('step', i, 'rss', process.memoryUsage().rss);
}
}
console.timeEnd('1');
gc();
console.log('end', process.memoryUsage()); The new PR #38337 runs about 3.9s in my computer. And 39f42d1a2d (before using |
I like the #38337, it fixes the memory leak issue. On my machine it finishes the loop in 7.9s. Master (with memory leak) is doing the same in 9.3s. |
I think the “performance loss” is really just the deallocations happening earlier. |
Yes, that's precisely what it is. Overall I still suggest that the code sample as-is is problematic and a different approach should be taken. |
Previously, the code path would allocated a tracked ArrayBuffer that defers cleanup and deallocation of the underlying data with a SetImmediate. Avoid the unnecessary deferral by just allocating a `BackingStore` directly and writing into it. Fixes: #38300 Refs: #38336 PR-URL: #38337 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Khaidi Chu <i@2333.moe> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Previously, the code path would allocated a tracked ArrayBuffer that defers cleanup and deallocation of the underlying data with a SetImmediate. Avoid the unnecessary deferral by just allocating a `BackingStore` directly and writing into it. Fixes: #38300 Refs: #38336 PR-URL: #38337 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Khaidi Chu <i@2333.moe> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Previously, the code path would allocated a tracked ArrayBuffer that defers cleanup and deallocation of the underlying data with a SetImmediate. Avoid the unnecessary deferral by just allocating a `BackingStore` directly and writing into it. Fixes: #38300 Refs: #38336 PR-URL: #38337 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Khaidi Chu <i@2333.moe> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Previously, the code path would allocated a tracked ArrayBuffer that defers cleanup and deallocation of the underlying data with a SetImmediate. Avoid the unnecessary deferral by just allocating a `BackingStore` directly and writing into it. Fixes: #38300 Refs: #38336 PR-URL: #38337 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Khaidi Chu <i@2333.moe> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
When Buffer::New passes in existing data, it cannot be garbage collected in js synchronous execution. Fixes: nodejs#40828 Refs: nodejs#38300
When Buffer::New passes in existing data, it cannot be garbage collected in js synchronous execution. Fixes: nodejs#40828 Refs: nodejs#38300
When Buffer::New passes in existing data, it cannot be garbage collected in js synchronous execution. Fixes: nodejs/node#40828 Refs: nodejs/node#38300 PR-URL: nodejs/node#42695 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
What steps will reproduce the bug?
Run the following code with
node --expose-gc
How often does it reproduce? Is there a required condition?
It reproduces consistently with v14.16.1 and v15.14.0.
It only reproduces with a string parameter. Arrays are not causing issues.
I also tested some older versions (v10.22.0, v12.8.1) and the leak is not present there.
What is the expected behavior?
Created buffers should be garbage collected, and memory usage shouldn't increase too much during execution.
What do you see instead?
Memory usage increases to over 10 GB with the supplied code snippet. I get OOM errors when I increase the number of iterations.
The text was updated successfully, but these errors were encountered: