Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gatsby-plugin-sharp] 'Generating image thumbnails' takes a long time. Option to skip transformations #25827

Closed
josephmarkus opened this issue Jul 17, 2020 · 6 comments
Labels
type: feature or enhancement Issue that is not a bug and requests the addition of a new feature or enhancement.

Comments

@josephmarkus
Copy link
Contributor

josephmarkus commented Jul 17, 2020

Summary

For a while I've been working on a project that has a lot of images.

One of the biggest pain points is seeing 'Generating image thumbnails' in my terminal.

It takes 12 minutes for a build on my project to complete when Gatsby flushes the cache.

The cache is flushed:

  • every time a dependency is bumped (with dependabot this happens on a daily basis)
  • every time gatsby-config.js is touched. Frankly, this doesn't happen very often, but it is still a pain.

I bumped Gatsby version in the project to take advantage of Jobs API V2, which runs 'Generating image thumbnails' in parallel with page queries. That barely makes a dent in build times.

I considered other options:

  1. moving image resizing out to a third-party service, such as imgix. Then I could remove gatsby-plugin-sharp, append transformations on my image URLs and get transformed images this way. The down side - additional costs and complexity in setup

  2. using serverless-sharp. Again, I could remove gatsby-plugin-sharp and then rely on AWS Lambda to handle image requests, run transformations on them and cache them. It's kind of like do-it-yourself imgix. The downside is all the problems associated with performance, maintenance and a bunch of other things I can't even think of

  3. using a CMS that has imgix integration. However, that's one of those high-effort and high-impact options. Maybe one day

  4. Add another dependency ImageOptim-CLI (see https://github.com/JamieMason/ImageOptim-CLI) to squeeze any unresized/uncompressed original images. Then run this once for all images and add a background task to resize and compress any newly added images. This way, the committed images would get squashed, so every time an image has to go through the resizing process, the number of bytes going through the image resizing sharp library, would be smaller. The downside is that this is like an improvement rather than a solution to a problem

  5. conditionally setting different gatsby-config, so that in NODE_ENV=development I don't have gatsby-plugin-sharp, whereas in NODE_ENV=production I do have gatsby-plugin-sharp. The downside is that this would would require changes to every GraphQL query:

childImageSharp {
  fluid(maxWidth: 1200) {
    ...GatsbyImageSharpFluid_withWebp
  }
}

The above query would have to change to return just src in development and the current version in production. This could be done via GraphQL fragments and conditional queries. However, this is quite ugly and introduces divergence in the code

And then every place that consumes it:

<Img fluid={data.something.childImageSharp.fluid} />

Would have to be handled by some sort of function that returns src in development and the current implementation in production.

This seems like a lot of overhead just to bypass time consuming image resizing

  1. Add a plugin option to gatsby-plugin-sharp to return original image src without doing the time consuming resizing when in development. This way my earlier mentioned GraphQL would still work, but every value would be the same unresized image src. It's a sort of by-pass without breaking the application.

It could look something like this:

plugins: [
  {
    resolve: `gatsby-plugin-sharp`,
    options: {
      returnOriginal: process.env.NODE_ENV !== 'production'
    },
  },
]

I'd happily invest some of my time to work on option 6, but first I'd like to hear Gatsby contributors thoughts on this issues as I don't know:

  • if this is already on the roadmap
  • if maintainers have discussed this and chucked the idea out the window
  • if maintainers have discussed this and came to a different conclusion

I look forward to hearing your thoughts on this

Basic example

plugins: [
  {
    resolve: `gatsby-plugin-sharp`,
    options: {
      returnOriginal: process.env.NODE_ENV !== 'production'
    },
  },
]

Motivation

To speed up developer experience when running gatsby locally

@josephmarkus josephmarkus added the type: feature or enhancement Issue that is not a bug and requests the addition of a new feature or enhancement. label Jul 17, 2020
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jul 17, 2020
@polarathene
Copy link
Contributor

4. Add another dependency ImageOptim-CLI (see https://github.com/JamieMason/ImageOptim-CLI) to squeeze any unresized/uncompressed original images. Then run this once for all images and add a background task to resize and compress any newly added images.

We already get image compression and resizing via sharp, layering another alternative into the mix doesn't bring any value? In fact we recently fixed a bug with webp, since someone slipped in an additional compression pass on webp where the quality setting would compound (75% * 75% == 0.56%, 50% * 50% == 25%).. That mistake happened because when webp support was added, there was an alternative PNG and JPG compressor options supported for those image formats to gain higher compression ratios, but those didn't suffer compounding as sharp passed on the buffer only before compressing iirc.

This way, the committed images would get squashed, so every time an image has to go through the resizing process, the number of bytes going through the image resizing sharp library, would be smaller.

Only smaller on disk, images need to decompress to memory for actual image operations afaik, thus the number of pixels will use the same amount of memory as those that have reduced colour/quality from quantization and such techniques. Better to reduce pixel count though.

I had 150MB for ~20 images from unsplash, I reduced them down to 2k at most along with image compression on-disk size was down to 6MB. That will read into memory faster, but the processing will be faster due to being 25% or less the original image size in pixels(which is also ~40MB no compression).


moving image resizing out to a third-party service, such as imgix.

Warning about imgix, Unsplash apparently uses them and refers to their API for image transformations. When I used that (via Unsplash) I noticed images would sometimes lose tone mapping(or looked like that), in that the warm tones were swapped for cold ones, I guess there either was a bug or setup issue that stripped away some information of the image.. I did not encounter that when I rewrote the transformations via a small sharp js script to pre-process locally.


It's kind of like do-it-yourself imgix. The downside is all the problems associated with performance, maintenance and a bunch of other things I can't even think of

Probably a good way to go, and iirc Gatsby Cloud leverages offloading sharp processing to Google Cloud functions, presumably with some sort of caching system. Not sure what performance issues you're referring to, maintenance can be less of a burden if it's a community maintained open-source project, I'm sure that something must exist out there, just might need a gatsby plugin for smooth build process?

Personally, while gatsby-plugin-sharp is useful and convenient for build time processing, I too understand it's a bit of a bother with it being tied to the project cache, especially with long processing times that should be avoidable. Running a local service not unlike your gatsby develop process which can be connected to via it's own port and provide the API and it's own cache would work well no?


I'd happily invest some of my time to work on option 6, but first I'd like to hear Gatsby contributors thoughts on this issues as I don't know

Seems like a good approach to me, but not sure if you can just return the src, what about base64 placeholders(or others like SVG?), image sizes(srcSetBreakpoints) or their image variants(art direction), perhaps the latter two don't cause any issues with this approach, I haven't thought about it enough.

Might want to choose a better name like bypass or skipProcessing? Another alternative might just be to specify a cache directory that isn't in the gatsby controlled cache location(although that'd still require an initial pass to fill the cache).


  • if this is already on the roadmap
  • if maintainers have discussed this and chucked the idea out the window
  • if maintainers have discussed this and came to a different conclusion

I think this issue is pretty much the same problem discussed in this issue. Perhaps chime in there with your suggestion to just return the image src and skip processing.

@josephmarkus
Copy link
Contributor Author

@polarathene thank you for your thoughtful feedback on this. much appreciated!

We already get image compression and resizing via sharp, layering another alternative into the mix doesn't bring any value?

What I mean is to take original image assets and compress those through the tool. Gatsby does compression and resizing through sharp, but then places those images into cache. This compression doesn't affect image at the source. If, for example, I have an image that's 10 MB, every time the cache is flushed, Gatsby would process 10 MB of the image. Compressing it at the source would mean that every time the cache is flushed, it takes less time to get the right image size as there's simply less data going through the pipeline.

Only smaller on disk [...] Better to reduce pixel count though.

That's a fair point. I think if an image is let's say 10 MB, then I'd resize it rather than try to compress it

the processing will be faster due to being 25% or less the original image size in pixels

I'm with you. I think by saying to compress the image I had in mind both the pixel count as well as its weight. Bad choice of terms on my part.

Warning about imgix [...] images would sometimes lose tone mapping

I think the overall benefit of taking image processing to an external service would outweigh an occasional bug, such as this. For them it's their bread & butter, so I hope I'm not naive in thinking that imgix would be on top of such issues.

Not sure what performance issues you're referring to

What I had in mind is caching and load speed. However, this may well be unfounded. Would have to spike something like this to understand the up and downsides of this approach.

Running a local service not unlike your gatsby develop process which can be connected to via it's own port and provide the API and it's own cache would work well no?

That's certainly something to investigate, but I wonder if the overhead of this sort of implementation would be equal to other alternatives, such as imgix or serverless functions for image sourcing and transformation

Seems like a good approach to me, but not sure if you can just return the src, what about base64 placeholders(or others like SVG?), image sizes(srcSetBreakpoints) or their image variants(art direction), perhaps the latter two don't cause any issues with this approach, I haven't thought about it enough.

I haven't considered this for every use case. I went through the main script file that does a lot and I think that one would have to choose an if/else escape hatch carefully, so not to have to place it in 20 locations. I feel that this flag should just return something like:

<picture>
    <img src="../path/to/image.jpg" alt="Some image" />
</picture>

And that's it. So no fancy loading or anything like that. It's just a way to have your application spin up fast. It doesn't seem like a big price to pay compare to the alternative that would need a bunch of GraphQL query changes. You kind of what this flag to do the magic for you and work out of the box. And I certainly wouldn't recommend this as a default as this is a problem once you get to larger scale projects.

Might want to choose a better name like bypass or skipProcessing?

Naming is hard and something like skipProcessing sounds much better 👍

Another alternative might just be to specify a cache directory that isn't in the gatsby controlled cache location(although that'd still require an initial pass to fill the cache).

This is definitely a more elegant suggestion - totally different cache for images. I like this even better than skipProcessing flag.

I think this issue is pretty much the same problem discussed in this issue. Perhaps chime in there with your suggestion to just return the image src and skip processing.

Thanks for the link, I'll do that 👍

@polarathene
Copy link
Contributor

Compressing it at the source would mean that every time the cache is flushed, it takes less time to get the right image size as there's simply less data going through the pipeline.

Yes, but this would suffer the same issue sharp is causing for you in the Gatsby system as it's meant to be reproducible so cache would likely still be used, or you'd get the same processing time.

What you want to do is pre-process your images, there is a discussion for documenting this for the community as a guide here, although it seems to be stalled.

I have an example script here, which has my source images committed to the repo via git-lfs, I can tweak the script and always have the original source available in case I want to change things such as crop or dimensions. Then the output I commit also via git-lfs in another directory where my Gatsby project will have sharp operate on. The script is fairly small with a few features demonstrated, it's tailored to my project but should be an adequate reference.

That reduced ~150MB of image data down to 6MB. Netlify and Vercel cannot use git-lfs, and afaik Netlify Large Media alternative feature isn't available during builds, so I have Cloudinary as an image host that a separate branch of that project uses with a remote images plugin.

Absolutely, you should pre-process the original images to optimize for sharp processing. If you use a CMS, then have those images served from the CMS, or where ever you'd normally have your images pulled from.

I think the bigger issue you're wanting to work around though is the unnecessary cache flushing which leads to sharp wasting time processing images that didn't need to be re-processed in the first place.


What I had in mind is caching and load speed. However, this may well be unfounded.

They're problems that need improvements, I agree :)

That's certainly something to investigate, but I wonder if the overhead of this sort of implementation would be equal to other alternatives, such as imgix or serverless functions for image sourcing and transformation

That depends on where it's being used, on a local machine for development, you'd not have to worry about network traffic which can be problematic for some due to poor connections which can not only be slow, but also result in failed transfers, which with Gatsby's current handling seems to result in clearing the cache and starting again just because one of N images failed but was considered successful because the partial image download was still a processable image, even if it's missing the bottom half of pixels..

Serverless functions are nice, but in some situations may run over the free tier limits and incur costs, if there's no cache layer there you end up paying for processing that you shouldn't need to as well. Image API services are good if the network connection is reliable and the service meets your needs, they've got similar drawbacks to serverless functions. Few offer compatibility with gatsby-image, requiring you to do more DIY work or collaborate on a new plugin to build the fluid/fixed data objects that gatsby-image expects, otherwise you're just remote downloading the images and still processing with sharp..


I feel that this flag should just return something like

Well, if it's effectively an img element instead of leveraging the proper gatsby-image component, that's going to create potential surprises during development/deploy unless another step is used to test that, users already raise issues with disparity of development mode with SSR at deploy causing React hydration surprises.

I would suggest better resolving the cache issue, then everything works well if an initial processing stage is acceptable. In the linked related issue, that's not for some users, so a better solution for them would probably to make the image handling difference more clear by providing some mock image data, or if that is a problem, just supplying the same image URI of the source image for each size, and some generic base64 placeholder(or omit that entirely).


You kind of what this flag to do the magic for you and work out of the box.

I don't see gatsby-image supporting such, it'd cause potential maintenance burdens with user support (not everyone reads docs properly), possibly PRs trying to make it more similar to normal usage blurring the lines further. Less of an issue with mock data since there is no additional variant in the component to support such a feature.

Alternatively, you could wrap the gatsby-image component and make it conditional which one is returned based on ENV var, not likely to be a great solution though. You'd still have to manage the build stage with graphql/sharp, so it's best to just return mock data if image content doesn't matter too much(since skipping processing/transforms makes it's output less reliable)

@LekoArts
Copy link
Contributor

Thanks for the issue and discussion so far, but this is a duplicate of #24822 and thus I'll close this. Please leaver your comments there, thanks!

@LekoArts LekoArts removed the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jul 20, 2020
@josephmarkus
Copy link
Contributor Author

I have an example script here, which has my source images committed to the repo via git-lfs, I can tweak the script and always have the original source available in case I want to change things such as crop or dimensions. Then the output I commit also via git-lfs in another directory where my Gatsby project will have sharp operate on. The script is fairly small with a few features demonstrated, it's tailored to my project but should be an adequate reference.

this is pretty interesting. I wasn't even aware of Git Large File Storage

I think the bigger issue you're wanting to work around though is the unnecessary cache flushing which leads to sharp wasting time processing images that didn't need to be re-processed in the first place.

Absolutely agree with you 👍

that's going to create potential surprises during development/deploy unless another step is used to test that

That's certainly a drawback. This solution would create divergence between production and development

I would suggest better resolving the cache issue

This would be ideal - separate caching layer for images that are so expensive to have. Having said that, people who have tens of thousands of images that take hours to resize, would not benefit from this as on occasion cache would need to be flushed, and waiting for hours on end is a non-starter

Alternatively, you could wrap the gatsby-image component and make it conditional

Not only that, but this would need changing GraphQL queries as well

@LpmRaven
Copy link

I'm looking for ways to improve the build times of my e-commerce store. Currently takes around 35 mins for my build to complete on an AWS general1.medium | 7 GB Memory | 4 vCPU. The largest chunk of time is success Generating image thumbnails - 1776.988s - 33894/33894 19.07/s. My site builds once a day but images rarely change, is there any way I can externalise this image process to a separate build that doesn't run daily or improve the time it takes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature or enhancement Issue that is not a bug and requests the addition of a new feature or enhancement.
Projects
None yet
Development

No branches or pull requests

4 participants