Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage regression in 1.21 #210

Closed
matthewvalentine opened this issue Jun 3, 2024 · 6 comments
Closed

Memory usage regression in 1.21 #210

matthewvalentine opened this issue Jun 3, 2024 · 6 comments
Assignees
Labels
bug A reported bug. confirmed A confirmed bug.

Comments

@matthewvalentine
Copy link

Thank you again for fixing #194! Though, one result of the fix is increased memory requirments if you use multiple regexes in a pipeline.

For example

s.replace(whitespaceRegex, ' ').replace(weirdCharacterRegex, '')

now keeps two extra translated strings in memory persistently, whereas previously it would

  1. only have one in memory at a time, and
  2. not keep it persistently after the replacing is done.

I believe that (1) is a more important issue than (2). (2) just means holding onto the same memory for longer, but (1) means the actual max memory requirements go up, potentially a lot if you have a process that uses many regexes.

It would be hard to fix this in general. But I think it should be fixable at least for everything that goes through the whole string, by dropping the cache when it completes. Such as:

  • Whenever lastIndex gets reset to 0
  • At the end of functions like replaceAll that go all the way through the string

I tried to make a PR that calls dropLastString() in those places, but I wasn't able to get the memory usage to go down, I am not sure why. I might try again later.

Another possible fix might be if there was a single global cache for all the RE2s. Although it seems a bit unclean, it would be guaranteed to have at most 1 string's worth of overhead, and should keep the linear performance on the assumption that people generally don't iterate through two different regexes simultaneously.

Here is a script that unambiguously displays the problem. On my machine, in 1.20.12 this uses 50 MB, while on 1.21 it uses 4 GB.

node --expose-gc script.js
const RE2 = require('./re2');
let s = '';
for (let i = 0; i < 20 * 1024 * 1024; i++) {
	s += 'a';
}
const regexes = [];
for (let i = 0; i < 200; i++) {
	const r = new RE2('x', 'g');
	regexes.push(r);
	s.replace(r, '');
	global.gc();
}
console.log('Done');
while (true) {}
@uhop
Copy link
Owner

uhop commented Jun 4, 2024

now keeps two extra translated strings in memory persistently

No strings are kept persistently.

@uhop
Copy link
Owner

uhop commented Jun 4, 2024

Hmm, I stand corrected. It appears to be a relatively easy fix…

@uhop uhop self-assigned this Jun 4, 2024
@uhop uhop added bug A reported bug. confirmed A confirmed bug. labels Jun 4, 2024
uhop added a commit that referenced this issue Jun 5, 2024
@uhop
Copy link
Owner

uhop commented Jun 5, 2024

It turned out that:

  • I had a commit missing (didn't push it from the computer I worked on it).
  • I had a bug (missing a virtual destructor).
  • The idea to keep a prepped version of a string depending on its life time was not a good choice for all use cases.

I think my last commits (the current master) fixed the problem with matchAll().

@matthewvalentine — please retest and let me know if it works for you.

PS: And thank you for the repro script!

PPS: Another idea is to keep lastStringValue as a weak pointer, so it can be collected by GC if needed. That's the proper way to do caches. I see what I can do about that.

@matthewvalentine
Copy link
Author

@uhop With the code on master, I see no memory issue anymore using the repro script with replace, replaceAll, match, matchAll, test or exec.

@uhop
Copy link
Owner

uhop commented Jun 5, 2024

I'll play around with the weak pointer idea and will release a new version. Thanks for finding the problem and helping to repro it!

@uhop
Copy link
Owner

uhop commented Jun 5, 2024

Published as 1.21.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A reported bug. confirmed A confirmed bug.
Projects
None yet
Development

No branches or pull requests

2 participants