-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashDiff never finishes when comparing two arrays #49
Comments
Thanks a lot for reporting the issue @nbarrientos . The memory behavior is to some extent interpretable, as As far as I know, in genetics there are a lot of clever algorithms to avoid such explosion problems. But I am unaware of simple algorithms that do not resort to external storage. |
Hi @liufengyun! Thanks for replying. Sorry, when I read the documentation of the library I did see indeed that the complexity of the algorithm was O(n2), but I naively assumed that it was the time complexity. I should have read further about the space complexity 😄 For the fun of it, I allowed a problem of input size 10000 (diffing two arrays of 10000 1-byte elements) to consume as much memory as it needed until it finished. Here's the memory footprint of the process. Same plot but increasing the size to 15000: I agree that perhaps adding a note clarifying this is a good idea, especially taking into account that LCS is enabled by default. Bonne journée! 👋 |
@nbarrientos Cool benchmarks 👍 Merci beaucoup et bonne journée! |
liufengyun/hashdiff#49 phew, already avoided
Hi,
When debugging an issue with octocatalog-diff we narrowed down the problem to HashDiff unable to compare a couple of arrays when LCS is used. Here's a reproducer:
When comparing those two arrays using LCS (which is on by default) the application starts to eat up memory very quickly with 100% CPU usage until, in our case, the process is killed by the kernel's OOM killer as the test machine does not have any swap:
$ ruby reproducer_synth.rb Without LCS Done! With LCS ... Killed
Yes, that's >1GiB!
I understand that LCS is O(n2) but not sure this is the expected behaviour as the array size is moderate.
The text was updated successfully, but these errors were encountered: