-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(gatsby-plugin-sharp): split single file into more maintainable chunks #10964
Conversation
I've got an issue that my cache directory is not being made which is problematic for this feature. I'll give it a go on linux/mac but we probably need to fix this. Not sure if this is a known issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some preliminary comments. This is looking great. I did test this out with examples/using-gatsby-image, and I occasionally received the following error, which could be related to your comment about the cache directory?
Error: Invariant Violation: Encountered an error trying to infer a GraphQL type for: "localFile___NODE". There is no corresponding node with the id field matching: "3a6b85557cfac6217812183fb1c7cc5e"
aa63055
to
cc8aec0
Compare
So what I haven't fixed yet is removing images that should not be processed because a page or query changed. |
should be good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks pretty nice!
There are couple of low-hanging fixes and improvements we can quickly make (inline comments).
But there is also one unsolved issue:
some images can be scheduled to be processed, but later on should be invalidated that currently doesn't happen - for most part this would just mean extra unneded work, but there is edge cases where this would cause straight errors - for example image was scheduled, but later on source image was deleted and now this scheduled task have no way of finishing and we would actually throw errors about trying to read not-existing files? Initially we can just add some sanity checks to ensure input file exists before trying to process it I think, but later on I think we need to keep track of queries to remove stale tasks
Is it ok if I do file test in this PR and do the query stuff in a follow up PR. |
👍 from me - it will be much easier to review |
f444f71
to
e40b829
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wardpeet excellent job with this PR! Very clean and well-tested, awesome stuff.
Generally - this looks great to me. There are clear and obvious performance improvements with the develop lifecycle with this change applied (tested on www/).
(bootstrap from a clean slate was around ~72 seconds with this change, and ~267.217s before this change--and it actually threw a SIGSEGV error the first time I tried)
Even if this doesn't introduce anything breaking, since it is a fairly significant refactor of some core stuff, I'd like to call this a minor bump for gatsby-plugin-sharp. Thoughts @pieh?
@DSchau thanks! I hope we can get more of these integration tests running to test some core pieces. @pieh if it helps I can omit default values from the args to make the cache smaller. I could do the same for inputPath & outputPath (strip the program root, where possible). I added an escape hatch called |
If it's relatively straight forward to do - let's strip as much as we can. But with changes you did to not persist entire File node earlier we are pretty good already (so this is nice to have, not a blocker) I'm still checking and validating this, but initial checks redux-state.json for gatsbyjs.org (with screenshot placeholders - so not trully production build, but still ~2400 image thumbnails) increased from |
@wardpeet Thanks for the heads up on this PR. I'm working on making the graphql resolvers run in other processes. To do this, those resolvers can't rely on accessing the redux store directly, since it runs in another process. This was fine, except that I've been teasing that out so that the job API functions are provided to plugin-sharp instead of it having to require them. This PR makes it even harder to run plugin-sharp in another process because it introduces more globals and dependence on the gatbsy-node APIs. But the benefits are obviously massive so I recommend merging this PR, and I'll have a think about whether it's possible to make this run in parallel resolvers land. |
I think it might be possible to make parallel resolvers work here. Calls to I'll wait till this is merged, and then have more of a think about it. |
@pieh I could run this to remove default ones: I could even remove lodash if needed. WDYT? should I sneak it in and than we're good to go? |
@pieh found an issue where we could remove images from disk between develop & build. At the moment we fail hard 😄. The idea is to do a simple file_exists on all files that are inside the cached version and ignore those errors. We will still fail hard if our current queue has errors. Any issues with this approach? I would love to get this merged 😄 |
I was thinking yesterday and today about this and this approach also have problems:
With silencing error build succeed, but we have missing assets for that image. I think build should fail. I don't have good approach yet. We might need for gatsby to expose more information to |
Turned lazy loading off so we can merge this without having impact 👍 cc @pieh |
Signed-off-by: Ward Peeters <ward@coding-tech.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, let's get that in!
Published in gatsby-plugin-sharp@2.0.26 |
Just a thought — if image processing jobs are marked as a dependency of a query — then when we re-run the query, we would delete all the old jobs associated with it (and recreate them of course when the query runs). This would solve I believe the stale job problem because deleting a file between dev and build would dirty the query making it re-run. |
any plans to turn the |
Please turn this on :) My build takes ~8 mins on a really fast macbook pro before I can even see my site. |
Description
This PR changes how the gatsby-plugin-sharp plugin handles images. Instead of processing our image queue right away we wait processing depending on your environment.
Process
Inside queueImageResizing we are pushing every job into a memory queue and we don't care
about processing here at all. Inside gatsby-node we're distinguishing develop and build.
onPostBootstrap we're going to save our jobs into a cache. This cache is needed when we switch from develop to build mode. If no data changed we won't execute any queries which means no images are scheduled.
When we run
gatsby develop
we will use theonCreateDevServer
hook to add an extra middelware to the express instance. Each file that doesn't exist yet gets fowarded to this route. We look into our queue if we know this url, if we do we process it and send the file over. If not we ignore and let express handle it.gatsby build
looks at our cached queue to see if it needs to do any processing. It loops over the queue and waits until all files are processed. We do this insideonPostBuild
to make sure it's only ran in build mode.Updates
Related Issues
#10815