Memory usage regression in 1.21 #210

matthewvalentine · 2024-06-03T21:12:24Z

Thank you again for fixing #194! Though, one result of the fix is increased memory requirments if you use multiple regexes in a pipeline.

For example

s.replace(whitespaceRegex, ' ').replace(weirdCharacterRegex, '')

now keeps two extra translated strings in memory persistently, whereas previously it would

only have one in memory at a time, and
not keep it persistently after the replacing is done.

I believe that (1) is a more important issue than (2). (2) just means holding onto the same memory for longer, but (1) means the actual max memory requirements go up, potentially a lot if you have a process that uses many regexes.

It would be hard to fix this in general. But I think it should be fixable at least for everything that goes through the whole string, by dropping the cache when it completes. Such as:

Whenever lastIndex gets reset to 0
At the end of functions like replaceAll that go all the way through the string

I tried to make a PR that calls dropLastString() in those places, but I wasn't able to get the memory usage to go down, I am not sure why. I might try again later.

Another possible fix might be if there was a single global cache for all the RE2s. Although it seems a bit unclean, it would be guaranteed to have at most 1 string's worth of overhead, and should keep the linear performance on the assumption that people generally don't iterate through two different regexes simultaneously.

Here is a script that unambiguously displays the problem. On my machine, in 1.20.12 this uses 50 MB, while on 1.21 it uses 4 GB.

node --expose-gc script.js

const RE2 = require('./re2');
let s = '';
for (let i = 0; i < 20 * 1024 * 1024; i++) {
	s += 'a';
}
const regexes = [];
for (let i = 0; i < 200; i++) {
	const r = new RE2('x', 'g');
	regexes.push(r);
	s.replace(r, '');
	global.gc();
}
console.log('Done');
while (true) {}

The text was updated successfully, but these errors were encountered:

uhop · 2024-06-04T01:03:27Z

now keeps two extra translated strings in memory persistently

No strings are kept persistently.

uhop · 2024-06-04T03:14:19Z

Hmm, I stand corrected. It appears to be a relatively easy fix…

uhop · 2024-06-05T01:29:58Z

It turned out that:

I had a commit missing (didn't push it from the computer I worked on it).
I had a bug (missing a virtual destructor).
The idea to keep a prepped version of a string depending on its life time was not a good choice for all use cases.

I think my last commits (the current master) fixed the problem with matchAll().

@matthewvalentine — please retest and let me know if it works for you.

PS: And thank you for the repro script!

PPS: Another idea is to keep lastStringValue as a weak pointer, so it can be collected by GC if needed. That's the proper way to do caches. I see what I can do about that.

matthewvalentine · 2024-06-05T02:50:02Z

@uhop With the code on master, I see no memory issue anymore using the repro script with replace, replaceAll, match, matchAll, test or exec.

uhop · 2024-06-05T03:13:06Z

I'll play around with the weak pointer idea and will release a new version. Thanks for finding the problem and helping to repro it!

…ance. Refs #210.

uhop · 2024-06-05T20:44:20Z

Published as 1.21.1.

uhop self-assigned this Jun 4, 2024

uhop added bug A reported bug. confirmed A confirmed bug. labels Jun 4, 2024

uhop added a commit that referenced this issue Jun 5, 2024

The missing commit. Refs #210.

e6d8cc1

viceice mentioned this issue Jun 5, 2024

fix: Revert "build(deps): update dependency re2 to v1.21.0" renovatebot/renovate#29455

Merged

uhop added a commit that referenced this issue Jun 5, 2024

Added a guard against calling a weak callback on already deleted inst…

5e779ce

…ance. Refs #210.

uhop mentioned this issue Jun 5, 2024

Make internal data garbage-collectable #211

Closed

uhop closed this as completed in c23cd4e Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage regression in 1.21 #210

Memory usage regression in 1.21 #210

matthewvalentine commented Jun 3, 2024

uhop commented Jun 4, 2024

uhop commented Jun 4, 2024

uhop commented Jun 5, 2024

matthewvalentine commented Jun 5, 2024

uhop commented Jun 5, 2024 •

edited

Loading

uhop commented Jun 5, 2024

Memory usage regression in 1.21 #210

Memory usage regression in 1.21 #210

Comments

matthewvalentine commented Jun 3, 2024

uhop commented Jun 4, 2024

uhop commented Jun 4, 2024

uhop commented Jun 5, 2024

matthewvalentine commented Jun 5, 2024

uhop commented Jun 5, 2024 • edited Loading

uhop commented Jun 5, 2024

uhop commented Jun 5, 2024 •

edited

Loading