Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(gatsby-transformer-sqip): improve sqip performance and debug logging #13349

Merged
merged 5 commits into from
Jun 18, 2019

Conversation

axe312ger
Copy link
Collaborator

Refactor the SQIP generation queuing avoid the primitive previews are generated multiple times for the same image.

Additionally improves debug logging.

Open question:
Seems like gatsby now removes .cache/sqip and .cache/caches when doing gatsby build. This breaks both caches integrated in the sqip transformer. Using cache.set/.get is now completely useless. Was this done on purpose? 😭

@pieh
Copy link
Contributor

pieh commented Apr 15, 2019

Seems like gatsby now removes .cache/sqip and .cache/caches when doing gatsby build. This breaks both caches integrated in the sqip transformer. Using cache.set/.get is now completely useless. Was this done on purpose? 😭

This shouldn't happen always - only when there are changes that trigger cache invalidation (such as updating plugin or editing gatsby-config or gatsby-node) - if it happens always, then that's a bug for sure.

@DSchau
Copy link
Contributor

DSchau commented Apr 19, 2019

@axe312ger could you provide any insight/response to @pieh's comment above? That will help us understand exactly what you're trying to do here! Thanks!

@DSchau DSchau added the status: awaiting author response Additional information has been requested from the author label Apr 19, 2019
@axe312ger
Copy link
Collaborator Author

I'll try to find a way to reproduce without my project. Maybe within a code-sandbox. Will come back here later.

@axe312ger
Copy link
Collaborator Author

Looks like the cache now resets with the expected behaviour.

Upgrading all my projects dependencies + recreating the package-lock.json helped to fix the issue

Still:

Looks like gatsby resets the caches in a smart way tracking changes of/gatsby-*.js and package.json.

SQIPs take a long time to generate and storing the results inside of .cache results in long builds for every new PR or small change that installed/updated a dependency

SQIP already has two layers of caching, the pure SVG result files are stored to the disk and the data which is accessible thourgh GraphQL is stored via gatsby cache.get/set

To workaround the gatsby cache reset, I moved the SQIP cache dir to /.cache-sqip instead of /.cache/sqip. Is this an option how to handle this?

Did the Gatsby team ever consider a long-term cache or something similar for processor intensive task results? Could be very useful for video conversion plugins as well.

Thanks a bunch,
Benedikt

@axe312ger
Copy link
Collaborator Author

Improved the loggings and did some tests to check if the fixed queueing and new cache dir is speeding up things. It clearly does:

Improved queueing: 289.46 sec -> 160.52 sec
Custom cache dir: 160.52 sec -> 139.859 sec

  • gatsby clean this pr, without cached files in custom cache dir: 160.52 sec
  • gatsby clean this pr, with cached files in custom cache dir: 139.859 sec
  • no clean, just gatsby build again (recurring build): 21.344 sec
  • same project without this pr after gatsby clean: 289.46 sec
  • same project without this pr, just gatsby-build again (recurring build): 19.206 sec

@KyleAMathews
Copy link
Contributor

Exciting!

/.cache-sqip sounds like a good solution for now. The ability to add a "permanent" cache is something we've long considered as there are other transformations that we know are "permanently" correct e.g. image transformations.

It's a somewhat complex problem as it's a high-powered ability to give to a plugin — to be able to permanently cache items. As if they're wrong, it's increasingly harder for a user to fix things.

A brief sketch of a design I was thinking through last night.

We could create a "permanent" cache in the home directory of the user and cap it's size at say 5% of the total size of the hard drive or 25% of the remaining free space, whichever is larger (numbers are handwavey — Google Chrome does something similar for their cache so we could look into that). We'd count accesses (reads or writes) to the cache and when the cache gets too large evict on a LRU basis. Plugins wanting to store stuff there would add a new permanent flag (the name isn't right probably) which would instruct Gatsby to copy the item there. Plugins would implement a similar flag for when checking if something is in the cache for Gatsby to go check the permanent cache.

If this works, this is also something we'd like to establish as a service for Gatsby Cloud so collaborators on a project would never need to redo transforms.

The main difficulty is that cache keys or file names (same thing) need to be globally unique and deterministic. Perhaps as a failsafe against problems with this (as with @pieh's recent discovery), plugins could add to keys a cache "version" number so that if there's ever a problem discovered with the key generation, they can increment the key number.

@axe312ger axe312ger force-pushed the improve-sqip-performance-and-logging branch from 3a3d77b to f5dfde4 Compare April 21, 2019 08:54
@axe312ger
Copy link
Collaborator Author

The main difficulty is that cache keys or file names (same thing) need to be globally unique and deterministic.

I thought the same, maybe gatsby can fingerprint these permanent files to reinit the transformation as soon the source file changed. This seems be good input about hashing the files: https://github.com/joliss/fast-js-hash-benchmark

For this PR:

Just adjusted the readme a little bit more and as Kyle is fine with using .cache-sqip for now:

I think its ready for a final code review / merge.

@KyleAMathews
Copy link
Contributor

maybe gatsby can fingerprint these permanent files to reinit the transformation as soon the source file changed

That's exactly how Gatsby works already :-D Whenever Gatsby sees that a file has changed, we create a new contentDigest from it which triggers the rebuilding of transformed assets.

const contentDigest = await md5File(slashedFile.absolutePath)

Does this plugin use the contentDigest for creating file names? If so, then it's already ready for permanent caching. This is how gatsby-transformer-sharp works as well.

@@ -138,7 +134,7 @@ async function sqipContentful({ type, cache, store }) {
const cacheImage = require(`gatsby-source-contentful/cache-image`)

const program = store.getState().program
const cacheDir = resolve(`${program.directory}/.cache/sqip/`)
const cacheDir = resolve(`${program.directory}/.cache-sqip/`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use node_modules/.cache? #5880

@axe312ger
Copy link
Collaborator Author

I have the updates ready for to use node_modules/.cache/ and contentDigest. But I lost permission to push to this branch:

$ git push
Enumerating objects: 46, done.
Counting objects: 100% (46/46), done.
Delta compression using up to 4 threads
Compressing objects: 100% (38/38), done.
Writing objects: 100% (38/38), 4.49 KiB | 25.00 KiB/s, done.
Total 38 (delta 32), reused 0 (delta 0)
remote: Resolving deltas: 100% (32/32), completed with 8 local objects.
remote: error: GH006: Protected branch update failed for refs/heads/improve-sqip-performance-and-logging.
remote: error: Cannot force-push to a protected branch
To github.com:gatsbyjs/gatsby.git
 ! [remote rejected]     improve-sqip-performance-and-logging -> improve-sqip-performance-and-logging (protected branch hook declined)
error: failed to push some refs to 'git@github.com:gatsbyjs/gatsby.git'

@KyleAMathews
Copy link
Contributor

Hmmm yeah — we changed permissions recently — can you just not force push? We do a squash commit on all PRs so we prefer people don't "clean up" old commits when you're exploring as it's harder to see the history of a PR.

@axe312ger
Copy link
Collaborator Author

Weird, got the branch fresh, cherry-picked my two commits on it, still get similar error:

$ git push
To github.com:gatsbyjs/gatsby.git
 ! [rejected]            improve-sqip-performance-and-logging -> improve-sqip-performance-and-logging (non-fast-forward)
error: failed to push some refs to 'git@github.com:gatsbyjs/gatsby.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

$ git pull
Already up to date.

$ git push
To github.com:gatsbyjs/gatsby.git
 ! [rejected]            improve-sqip-performance-and-logging -> improve-sqip-performance-and-logging (non-fast-forward)
error: failed to push some refs to 'git@github.com:gatsbyjs/gatsby.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

@LekoArts
Copy link
Contributor

@axe312ger Hey, could you please try again, we updated our permissions in the meantime. Thanks!

@KyleAMathews
Copy link
Contributor

Ping

@axe312ger axe312ger force-pushed the improve-sqip-performance-and-logging branch from b990e6e to 3ec3f62 Compare June 3, 2019 18:50
@axe312ger
Copy link
Collaborator Author

Sorry. Done :) @KyleAMathews @LekoArts

@axe312ger axe312ger force-pushed the improve-sqip-performance-and-logging branch from 3ec3f62 to 8164840 Compare June 3, 2019 18:55
@axe312ger axe312ger requested a review from a team as a code owner June 13, 2019 22:19
Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! seem to work like a charm. Testing it on the gatsby sqip example.

@wardpeet wardpeet changed the title improve sqip performance and debug logging chore(gatsby-transformer-sqip): improve sqip performance and debug logging Jun 18, 2019
@wardpeet wardpeet merged commit b225b92 into master Jun 18, 2019
@wardpeet wardpeet deleted the improve-sqip-performance-and-logging branch June 18, 2019 11:23
@sidharthachatterjee
Copy link
Contributor

Published in gatsby-transformer-sqip@2.0.43

@axe312ger
Copy link
Collaborator Author

axe312ger commented Jun 18, 2019

Thank you so much!

Be aware, I come back soon to push a PR which upgrades SQIP to v1 which is finally async 😏

And adds a huge load of new features:

https://axe312ger.github.io/sqip/

mxxk pushed a commit to mxxk/gatsby that referenced this pull request Jun 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: awaiting author response Additional information has been requested from the author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants