Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Caching #158

Closed
eterry1388 opened this issue Mar 10, 2020 · 19 comments
Closed

Feature Request: Caching #158

eterry1388 opened this issue Mar 10, 2020 · 19 comments

Comments

@eterry1388
Copy link

We have a large codebase of 1,277 erb files. Running 12 linters on all of those files, it takes over 3 minutes. This is a bottleneck to our build system. Ideally, we could run these in parallel or with caching (so only changed files are inspected). This problem will only get worse as we enable more linters and add more erb files to our codebase.

Rubocop has a cache option explained here: https://github.com/rubocop-hq/rubocop/blob/master/manual/caching.md

We would love if such a feature existed for erblint. Thank you!

@heqianw
Copy link

heqianw commented Oct 16, 2020

The documentation link was moved to this current location: https://github.com/rubocop-hq/rubocop/blob/master/docs/modules/ROOT/pages/usage/caching.adoc

@ChrisBr ChrisBr mentioned this issue Nov 4, 2020
3 tasks
ChrisBr added a commit that referenced this issue Nov 4, 2020
First approach to implement a file cache.

#158

Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
zachfeldman pushed a commit to zachfeldman/erb-lint that referenced this issue Aug 12, 2022
First approach to implement a file cache.

Shopify#158

Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
zachfeldman pushed a commit to zachfeldman/erb-lint that referenced this issue Aug 20, 2022
First approach to implement a file cache.

Shopify#158

Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
zachfeldman pushed a commit to zachfeldman/erb-lint that referenced this issue Aug 30, 2022
First approach to implement a file cache.

Shopify#158

Co-authored-by: Mike Dalessio <mike.dalessio@gmail.com>
@zachfeldman
Copy link

@eterry1388 an initial caching implementation is complete in #268! If you, @heqianw , or anyone else following this thread want to give it a review and/or try out the branch, I'm definitely game for comments 👍

@joshuapinter
Copy link
Contributor

joshuapinter commented Sep 27, 2022

@zachfeldman Incredible! Thanks for putting in the work to get this done. I'll use your fork and see how it goes at CNTRAL.

@joshuapinter
Copy link
Contributor

Results:

Before

$ time be rake default:erblint

====
Running ERB Lint...

Linting 596 files with 9 linters...
No errors were found in ERB files
bundle exec rake default:erblint  72.19s user 3.06s system 95% cpu 1:18.75 total

After

$ time be rake default:erblint

====
Running ERB Lint...

Cache mode is on
Linting 596 files with 9 linters...
No errors were found in ERB files
bundle exec rake default:erblint  71.87s user 3.09s system 95% cpu 1:18.28 total

$ time be rake default:erblint

====
Running ERB Lint...

Cache mode is on
Linting 596 files with 9 linters...
No errors were found in ERB files
bundle exec rake default:erblint  71.52s user 3.00s system 95% cpu 1:17.74 total

$ time be rake default:erblint

====
Running ERB Lint...

Cache mode is on
Linting 596 files with 9 linters...
No errors were found in ERB files
bundle exec rake default:erblint  3.01s user 2.70s system 64% cpu 8.896 total

NOTE: I had to run it 3 times with caching mode on before I saw the increase speed. My assumption was that it would take running it only once with caching mode on for the files to be cached and then the second time would make use of that cache and run quickly. Maybe @zachfeldman can explain why this is.

For anybody else wanting to try @zachfeldman's fork, here's what you need to do:

  1. Add his fork to your Gemfile like this:

    gem "erb_lint", 
      github: "zachfeldman/erb-lint",
      branch: "zfeldman/cbruckmayer/implement-file-cache",
      require: false
  2. Use --with-cache when running the erblint command.

  3. Run the command a few times with caching mode turned on before you see the benefits.

Question @zachfeldman: Where is the cache stored? In development this works out of the box but I'll want to use this in our Github Actions CI and will likely need to cache and restore it specifically in between setups.

Thanks again for this. Seriously awesome. Makes the need for multithreading moot.

@joshuapinter
Copy link
Contributor

Question @zachfeldman: Where is the cache stored? In development this works out of the box but I'll want to use this in our Github Actions CI and will likely need to cache and restore it specifically in between setups.

Never mind, it's cached to .erb-lint-cache directory. I updated our CNTRAL CI to cache and restore this directory automatically to keep things fast. It'll take a few runs before all of our Runners' caches are warm.

I also added this to .gitignore. 👍

@eterry1388
Copy link
Author

Amazing work! Here are my results (on a different codebase than from 2 year ago when I originally posted this feature request). My current .erb-lint.yml file looks like this:

linters:
  Rubocop:
    enabled: true
    rubocop_config:
      inherit_from:
        - .rubocop.yml

On current Shopify:main

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all
Linting 464 files with 1 linters...

No errors were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all  22.66s user 0.90s system 89% cpu 26.343 total

On zachfeldman:zfeldman/cbruckmayer/implement-file-cache

First run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 464 files with 1 linters...

No errors were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   22.75s user 0.99s system 82% cpu 28.796 total

Second run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 464 files with 1 linters...

No errors were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   1.42s user 0.80s system 47% cpu 4.691 total

Great improvement! Went from over 22 seconds to less than 2 seconds.

I checked the created .erb-lint-cache directory and saw the checksum files:

ls .erb-lint-cache | wc -l
     464

Finally, I tested this with files that had errors

First run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 464 files with 1 linters...

107 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   22.79s user 1.01s system 82% cpu 28.869 total

Second run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 464 files with 1 linters...

107 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   4.00s user 1.04s system 54% cpu 9.273 total

I then fixed some of the errors and ran with --with-cache to make sure even with cache the errors would be resolved:

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 464 files with 1 linters...

104 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   3.86s user 0.92s system 55% cpu 8.580 total

Which it did!

Some of the cache files do get large:

ls -alhS .erb-lint-cache
total 7968
-rw-r--r--    1 ericterry  staff    32K Sep 27 07:45 a708b91b7d5ab225f5dc30980a73f9c4472e7921
-rw-r--r--    1 ericterry  staff    29K Sep 27 07:45 b21aeb1092895e1d364dd06d7b98e201bf62856f
-rw-r--r--    1 ericterry  staff    26K Sep 27 07:45 dc8cab6f33edcf2a527781e7e2cf8a84d85ff286
-rw-r--r--    1 ericterry  staff    22K Sep 27 07:45 41ac9e4be5242d99247bae207463e125abb52076

But I assume this is expected and probably similar to what rubocop does.

@eterry1388
Copy link
Author

Wanted to post one more test, this time with a ridiculous number of errors.

TL;DR

  • Using cache when there are little to no errors greatly improves the speed of the run
  • Using cache when there are a lot of errors greatly reduces the speed of the run, consumes a large amount of memory, and makes for very large cache files

First run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 461 files with 18 linters...

14206 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   29.23s user 1.24s system 84% cpu 36.071 total

Second run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 461 files with 18 linters...

14206 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   331.17s user 15.63s system 68% cpu 8:28.47 total

Yikes! With caching made it a LOT longer to run. I check out out the cache directory and some of the files were massive:

ls -alhS .erb-lint-cache
total 103232
-rw-r--r--    1 ericterry  staff   855K Sep 27 08:02 965f5b1d3d0c976a6ee368ac55d881185a8f306f
-rw-r--r--    1 ericterry  staff   609K Sep 27 08:01 0135f4d6d8b53ee8b87e1688489261ad09f98b19
-rw-r--r--    1 ericterry  staff   568K Sep 27 08:02 a2e1c15703c939c395640a39027aae4f83238cd0
-rw-r--r--    1 ericterry  staff   564K Sep 27 08:01 df23b118633784d2179883912dda76a75dc1a02c
...

This second run was using consistent high CPU the whole time (looking at htop) and the memory usage continued to grow slowly up to around 3.5 GB!

I wanted to run a third time, but this time without --with-cache to make sure it was an issue with this new feature as opposed to just a general feature with erblint

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all
Linting 461 files with 18 linters...

14206 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all  27.94s user 0.87s system 89% cpu 32.256 total

It worked fine and memory usages stayed low.

@zachfeldman
Copy link

@joshuapinter @eterry1388 thanks so much for trying the cache and posting your results here! Super useful data to see how it works in other projects. As I mentioned in the pull request I unfortunately haven't had time to return to this as I'm only working on it in my spare time, but hope to soon.

@joshuapinter
Copy link
Contributor

@zachfeldman No problem! Thanks for putting all the effort in with this. It's made a dramatic improvement in our CI runtimes and such a valuable addition. 🙏

@zachfeldman
Copy link

@eterry1388 I did some testing and I think I identified that the bottleneck is simply in how we store the cache files and restore them. If you take a look here https://github.com/Shopify/erb-lint/pull/268/files#diff-f47f3afa10e485d1d8ca903dd7f3a0da0b7dd4427efc465993d2f940cc5b82c6R40 you'll see I'm parsing the JSON structures in the cache files. I have to read each cache file, parse the JSON, then restore an Offense in some cases with the full source of the original file. I'm sticking the full source from the already read source that the cli.rb reads earlier so that's probably not the bottleneck. But simple reading a bunch of cache files into memory then parsing their JSON is probably what's taking up all the RAM. Just running ERB Lint without the cache doesn't have to do any of that work, hence less memory usage.

I'm trying to reconsider how I store and retrieve the cache into a faster method than single files parsed as JSON with source mudged on top. Will report back if I think of something faster (maybe just using a single file? Maybe using YML? etc).

@zachfeldman
Copy link

@eterry1388 I reconsidered exactly what I was caching in this commit #268 deciding to only cache the fields of an Offense necessary for the Reporter to print to the command line at the end of running erb-lint. I'm not sure if this'll cause other problems vs trying to restore the full Offense object but it seems to work well in my testing and reduce RAM usage a ton. No need to store huge JSON objects and restore them into memory!
b331799

again, I'd like others to test this to verify it doesn't degrade any other functionality.

@eterry1388
Copy link
Author

Ok. Here are my testing results.

With everything passing

Your recent changes did not affect the speed at all, which is good. Also, when there are no errors, all the files in .erb-lint-cache are blank (well, technically []%).

First Run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 461 files with 1 linters...

No errors were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   22.09s user 1.03s system 84% cpu 27.492 total

Second Run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 461 files with 1 linters...

No errors were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   1.29s user 0.81s system 46% cpu 4.545 total

With a ton of errors

First Run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 457 files with 18 linters...

14113 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   27.77s user 1.06s system 87% cpu 33.036 total

Second Run

time be bundle exec ../../personal/erb-lint/exe/erblint --lint-all --with-cache
Cache mode is on
Linting 457 files with 18 linters...

Exception occurred when processing: app/views/customers/index.html.erb
If this file cannot be processed by erb-lint, you can exclude it in your configuration file.
undefined method `to_sym' for nil:NilClass
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/cached_offense.rb:35:in `from_json'
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/cache.rb:22:in `block in get'
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/cache.rb:21:in `map'
...[truncated backtrace]...

6095 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   2.10s user 1.39s system 47% cpu 7.325 total

There were a lot of exceptions. I'm just showing one of them above. However, they all seemed to be the same exception. As you can see, the number of errors went down a lot. I think it is probably because of the exceptions and erb-lint not being able to process the file. However, it processed it just fine on the first run.

As far as the file sizes go, they have been greatly reduced:

ls -alhS .erb-lint-cache
total 5544
-rw-r--r--    1 ericterry  staff    28K Oct 11 19:51 44ee571f2159b1a7635a77d20c88497ccf29c051
-rw-r--r--    1 ericterry  staff    21K Oct 11 19:51 9ccf865c98c87342ec0560b33e884b00f531b38e
-rw-r--r--    1 ericterry  staff    20K Oct 11 19:51 2c6876d51828d2799979c7ffcaa2e766e4c8c3c0
-rw-r--r--    1 ericterry  staff    19K Oct 11 19:51 770a4950bf01262da1d646ea7fac2520d8acac96
-rw-r--r--    1 ericterry  staff    19K Oct 11 19:51 7e69231c130c783454ebdc3ca49df21e996e7c7e
...[truncated]...

The fix

I got it to work without exceptions!

14113 error(s) were found in ERB files
bundle exec bundle exec ../../personal/erb-lint/exe/erblint --lint-all   1.55s user 0.93s system 46% cpu 5.275 total

Looks like you are trying to call to_sym in from_json method like this:

parsed_json[:severity].to_sym

Just need to add a safe navigator operator:

parsed_json[:severity]&.to_sym

This is because some of the json looks like this where severity is nil:

{"message"=>"No space detected where there should be a single space.", "line_number"=>"22", "severity"=>nil}

@eterry1388
Copy link
Author

Hey I thought to test this with --format and looks like we have some more things to fix.

--format compact

With this, I get this exception:

NoMethodError: undefined method `column' for #<ERBLint::CachedOffense:0x00000001205f99e0>
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/reporters/compact_reporter.rb:35:in `format_offense'
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/reporters/compact_reporter.rb:13:in `block (2 levels) in show'
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/reporters/compact_reporter.rb:12:in `each'
...

--format json

NoMethodError: undefined method `linter' for #<ERBLint::CachedOffense:0x000000010993a670>
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/reporters/json_reporter.rb:59:in `format_offense'
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/reporters/json_reporter.rb:53:in `block in formatted_offenses'
/Users/ericterry/git/personal/erb-lint/lib/erb_lint/reporters/json_reporter.rb:52:in `map'
...

@eterry1388
Copy link
Author

Ok I did get all formatters to work with the CachedOffense class, but it about doubled the cached files:

ls -alhS .erb-lint-cache
total 7776
-rw-r--r--    1 ericterry  staff    48K Oct 11 21:21 44ee571f2159b1a7635a77d20c88497ccf29c051
-rw-r--r--    1 ericterry  staff    35K Oct 11 21:21 9ccf865c98c87342ec0560b33e884b00f531b38e
-rw-r--r--    1 ericterry  staff    33K Oct 11 21:21 2c6876d51828d2799979c7ffcaa2e766e4c8c3c0
-rw-r--r--    1 ericterry  staff    33K Oct 11 21:21 770a4950bf01262da1d646ea7fac2520d8acac96
-rw-r--r--    1 ericterry  staff    32K Oct 11 21:21 d9943624c9bb9cd12e5993e03d2f077310b8aa8f
...

I don't think this is is a big deal as it's still 20 times smaller than the original approach, and the time it took to run stayed the same (around 2 seconds). I'll put up my code changes soon.

@eterry1388
Copy link
Author

@zachfeldman I created a pull request to your fork: zachfeldman#1 to support the json and compact reporters.

@zachfeldman
Copy link

Thanks @eterry1388 really appreciate your contribution! I'll take a look after work today and if all looks good, merge it to my branch.

@zachfeldman
Copy link

@eterry1388 pull request merged.

@zachfeldman
Copy link

zachfeldman commented Oct 26, 2022

The cache has been merged to master! Thanks everyone here for your help testing it <3

@etiennebarrie is calling for one more change before he cuts a new version of erb-lint if anyone is interested in drafting up a PR for it:
#268 (comment)

@etiennebarrie
Copy link
Member

We have #282 ready for that. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants