-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage regression in 1.21 #210
Comments
No strings are kept persistently. |
Hmm, I stand corrected. It appears to be a relatively easy fix… |
It turned out that:
I think my last commits (the current master) fixed the problem with @matthewvalentine — please retest and let me know if it works for you. PS: And thank you for the repro script! PPS: Another idea is to keep |
@uhop With the code on master, I see no memory issue anymore using the repro script with |
I'll play around with the weak pointer idea and will release a new version. Thanks for finding the problem and helping to repro it! |
Published as 1.21.1. |
Thank you again for fixing #194! Though, one result of the fix is increased memory requirments if you use multiple regexes in a pipeline.
For example
now keeps two extra translated strings in memory persistently, whereas previously it would
I believe that (1) is a more important issue than (2). (2) just means holding onto the same memory for longer, but (1) means the actual max memory requirements go up, potentially a lot if you have a process that uses many regexes.
It would be hard to fix this in general. But I think it should be fixable at least for everything that goes through the whole string, by dropping the cache when it completes. Such as:
lastIndex
gets reset to 0replaceAll
that go all the way through the stringI tried to make a PR that calls
dropLastString()
in those places, but I wasn't able to get the memory usage to go down, I am not sure why. I might try again later.Another possible fix might be if there was a single global cache for all the RE2s. Although it seems a bit unclean, it would be guaranteed to have at most 1 string's worth of overhead, and should keep the linear performance on the assumption that people generally don't iterate through two different regexes simultaneously.
Here is a script that unambiguously displays the problem. On my machine, in 1.20.12 this uses 50 MB, while on 1.21 it uses 4 GB.
The text was updated successfully, but these errors were encountered: