-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance / resource usage for big codebases #10
Comments
Also it uses loads of memory for such large codebases, around 27GB for clang. |
Yeah, changed to (turns out, different on each OS!) variants of I have some thoughts about improving performance and reducing memory usage for very large codebases, but didn't get to do that yet. |
I've encountered similar issues when trying this. In order to get this partially working, I had to rewrite parts of the code. My changes ain't ready to share, as I mainly hacked it in. Not sure if I even want to clean it up. Some elements I found (runStop):
The analyze phase currently takes way too much memory to be usable (combined file of 40GB causes a memory usage of 220GB). From what I can see, to support this kind of magnitude, the choice of JSON parser ain't OK. As it's a DOM-parser, it first needs to get the complete file in memory and then translate it into a tree (takes even more space) before any preprocessing can be done. If you really want to support big files, I think you'll need a SAX parser |
I was thinking that majority of space in the combined JSON file (or even in single JSON file) is redundant strings, e.g. full paths to where exactly My plan is to at some point making the "smash all jsons into one huge json" a bit more intelligent. It could de-duplicate strings and just store their IDs, and a table of ID->string elsewhere. Maybe then it would not be as huge. |
Merged the above to master, should be better than previously. If still issues on your codebases, please reopen! |
The analyzer can't read large json files due to this code:
Maybe it should just use memory-mapped files.
The text was updated successfully, but these errors were encountered: