-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Caching #158
Comments
The documentation link was moved to this current location: https://github.com/rubocop-hq/rubocop/blob/master/docs/modules/ROOT/pages/usage/caching.adoc |
First approach to implement a file cache. #158 Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
First approach to implement a file cache. Shopify#158 Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
First approach to implement a file cache. Shopify#158 Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
First approach to implement a file cache. Shopify#158 Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
@eterry1388 an initial caching implementation is complete in #268! If you, @heqianw , or anyone else following this thread want to give it a review and/or try out the branch, I'm definitely game for comments 👍 |
@zachfeldman Incredible! Thanks for putting in the work to get this done. I'll use your fork and see how it goes at CNTRAL. |
Results:Before
After
NOTE: I had to run it 3 times with caching mode on before I saw the increase speed. My assumption was that it would take running it only once with caching mode on for the files to be cached and then the second time would make use of that cache and run quickly. Maybe @zachfeldman can explain why this is. For anybody else wanting to try @zachfeldman's fork, here's what you need to do:
Question @zachfeldman: Where is the cache stored? In development this works out of the box but I'll want to use this in our Github Actions CI and will likely need to cache and restore it specifically in between setups. Thanks again for this. Seriously awesome. Makes the need for multithreading moot. |
Never mind, it's cached to I also added this to |
Amazing work! Here are my results (on a different codebase than from 2 year ago when I originally posted this feature request). My current linters:
Rubocop:
enabled: true
rubocop_config:
inherit_from:
- .rubocop.yml On current
|
Wanted to post one more test, this time with a ridiculous number of errors. TL;DR
First run
Second run
Yikes! With caching made it a LOT longer to run. I check out out the cache directory and some of the files were massive:
This second run was using consistent high CPU the whole time (looking at I wanted to run a third time, but this time without
It worked fine and memory usages stayed low. |
@joshuapinter @eterry1388 thanks so much for trying the cache and posting your results here! Super useful data to see how it works in other projects. As I mentioned in the pull request I unfortunately haven't had time to return to this as I'm only working on it in my spare time, but hope to soon. |
@zachfeldman No problem! Thanks for putting all the effort in with this. It's made a dramatic improvement in our CI runtimes and such a valuable addition. 🙏 |
@eterry1388 I did some testing and I think I identified that the bottleneck is simply in how we store the cache files and restore them. If you take a look here https://github.com/Shopify/erb-lint/pull/268/files#diff-f47f3afa10e485d1d8ca903dd7f3a0da0b7dd4427efc465993d2f940cc5b82c6R40 you'll see I'm parsing the JSON structures in the cache files. I have to read each cache file, parse the JSON, then restore an I'm trying to reconsider how I store and retrieve the cache into a faster method than single files parsed as JSON with source mudged on top. Will report back if I think of something faster (maybe just using a single file? Maybe using YML? etc). |
@eterry1388 I reconsidered exactly what I was caching in this commit #268 deciding to only cache the fields of an again, I'd like others to test this to verify it doesn't degrade any other functionality. |
Ok. Here are my testing results. With everything passingYour recent changes did not affect the speed at all, which is good. Also, when there are no errors, all the files in First Run
Second Run
With a ton of errorsFirst Run
Second Run
There were a lot of exceptions. I'm just showing one of them above. However, they all seemed to be the same exception. As you can see, the number of errors went down a lot. I think it is probably because of the exceptions and erb-lint not being able to process the file. However, it processed it just fine on the first run. As far as the file sizes go, they have been greatly reduced:
The fixI got it to work without exceptions!
Looks like you are trying to call parsed_json[:severity].to_sym Just need to add a safe navigator operator: parsed_json[:severity]&.to_sym This is because some of the {"message"=>"No space detected where there should be a single space.", "line_number"=>"22", "severity"=>nil} |
Hey I thought to test this with
|
Ok I did get all formatters to work with the
I don't think this is is a big deal as it's still 20 times smaller than the original approach, and the time it took to run stayed the same (around 2 seconds). I'll put up my code changes soon. |
@zachfeldman I created a pull request to your fork: zachfeldman#1 to support the json and compact reporters. |
Thanks @eterry1388 really appreciate your contribution! I'll take a look after work today and if all looks good, merge it to my branch. |
@eterry1388 pull request merged. |
The cache has been merged to master! Thanks everyone here for your help testing it <3 @etiennebarrie is calling for one more change before he cuts a new version of |
We have #282 ready for that. 🎉 |
We have a large codebase of 1,277 erb files. Running 12 linters on all of those files, it takes over 3 minutes. This is a bottleneck to our build system. Ideally, we could run these in parallel or with caching (so only changed files are inspected). This problem will only get worse as we enable more linters and add more erb files to our codebase.
Rubocop has a
cache
option explained here: https://github.com/rubocop-hq/rubocop/blob/master/manual/caching.mdWe would love if such a feature existed for
erblint
. Thank you!The text was updated successfully, but these errors were encountered: