-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit Cache Size on Disk #18045
Comments
Thanks for reporting. I would have expected the disk cache to be limited to 200G . Could you share which folders of the codeql database and cache are larger than expected? If you are using Linux or macOS then the
|
Thank you for the attention to this issue! The I get: Thanks. |
Could you get a more detailed |
Sure.
I checked |
I did a quick estimate myself. It seems to me that half of the items under |
I checked with some of the CodeQL developers and they said: The The word "cache" here is actually a bit of a misnomer -- in production analyses, this space is mostly used for intermediate results that we had to spill out from RAM because there were too many of them to fit there. If the results take up that much space on disk, it's probably a symptom that CodeQL is doing far too much computation, so I'd presume the query also takes far too long. @yyilong335 If you are willing to share the database and the query, we can try to determine whether there's a performance problem with one of our own supported queries. |
For what it's worth, your observations of the file structure are consistent with the cache subsystem working as designed. The individual items stored in the cache are each identified by a hash. The evaluator starts by storing those in files named |
@aibaars Thank you. I feel the database I am analyzing is very big, the pattern I would like to find is complex, and I have no idea how to optimize my query code as well. Basically, I am trying to find a store-store-load pattern, and follow by an indirect branch, in large code database like httpd. It is taking day-level time, but still not finishing. To be more specific, here are two queries, The I am a newbie in CodeQL, and I clearly understand asking for debugging or optimization could be out-of-scope for this issue. But I would like to say thank you very much. @hmakholm Thank you for the clarification. I understand the cache subsystem is working fine now. |
@yyilong335 Debugging QL query performance can be quite tricky. Make sure to read this page first https://codeql.github.com/docs/writing-codeql-queries/troubleshooting-query-performance/ . This page lists the most common causes of performance problems. At a glance I think that maybe If looking for any of the common performance problems does not help, then the next step would be to run the query with evaluator logging and tuple counting switched on. The resulting logs should give insight in what predicates are taking lots of time and produce lots of (intermediate) results. You may want to find a smaller code base to test on though, get performance logs, identify and fix the slowest predicates, and gradually try on larger projects. |
@aibaars Thank you for pinpointing the problem and suggestion! |
Dear Developers,
I am using CodeQL to analyze my database. In the command, I use
--max-disk-cache=200000
to specify the maximum disk space the cache would take in this query. However, when it finishes, the cache is taking to 1.3TB.Would
--max-disk-cache
limit the disk to use? Or is there any other command to resolve this issue?Thank you so much.
The text was updated successfully, but these errors were encountered: