-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gatsby-parallel-runner package #21733
Conversation
This adds a new plugin called `gatsby-parallel-runner` that brings support for running external jobs in parallel cloud functions. Out of the box it adds support for the gatsby-plugin-sharp processor and can significantly speed up image heavy Gatsby sites. It comes with an extensible model that can be used to allow any Gatsby plugin to take advantage of this type of cloud function based parallelization.
topic: () => { | ||
return { | ||
publish: async msg => { | ||
expect(msg).toBe(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tautology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed the shadowing that caused this
Just out of curiosity, what does the plugin do when you're running a build locally while offline? Does it fall back to the old behavior of running everything in a single thread or does it try and fail to access Google cloud? |
Great work!!! 😸 |
await file.download({ destination: `/tmp/result-${id}` }) | ||
const data = (await fs.readFile(`/tmp/result-${id}`)).toString() | ||
const payload = JSON.parse(data) | ||
await fs.remove(`/tmp/result-${id}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When i understand this comment googleapis/nodejs-storage#676 (comment) right
Then you can read the filecontent direct without saving the file to disc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was expecting that to work and had const data = await file.download({})
in an earlier version, but consistently got an empty response in that way. It would be much cleaner though. /tmp
is an in memory volume, so performance wise one approach or the other probable won't make a difference, but it does add some annoying boilerplate code.
packages/gatsby-parallel-runner/src/processor-queue/implementations/google-functions/index.js
Outdated
Show resolved
Hide resolved
Apparently some Google Accounts enforces only lowercase + dashes in function names. This makes sure functions deployed via gatsby-parallel-runner follow that convention
packages/gatsby-parallel-runner/src/processor-queue/implementations/google-functions/index.js
Outdated
Show resolved
Hide resolved
Hey Matt! This is super exciting! We were hoping people would jump in with more implementations of the coming Jobs API so very glad y'all are doing deep dives on this 🙏 So a little background for everyone on the status of the Jobs API. As you probably have guessed, it's super early (and undocumented haha so props on reverse engineering it!). We've switched one plugin ( We'll be moving more plugins / internal functions over to the API & testing how that goes. We'll also be adding to core a "local build" implementation (on by default) that'll distribute jobs to our local worker processes (which we use already for doing HTML SSR). We're pretty certain the API will change as we do this so are intentionally leaving it undocumented while we experiment & learn. One of our TODOs is to create a job metadata spec so that our cloud job runner (and other implementations) have all the info needed to create the functions for running jobs. That way plugins don't have to directly do anything to get jobs running anywhere — just write to the API. This would enable an e.g. So long story short — we're really excited to see your implementation but it seems a bit early as we're probably going to be moving APIs around as we learn things & do more implementations. Also the extra framework bits you add won't be necessary. We're happy to have you maintain it in the meantime & we'd love to share learnings as we go (there's probably going to be a fair bit of nuance around how and when to distribute & cache jobs) but adding it to the Gatsby monorepo wouldn't help us move quickly in the short term. As things settle we'll be writing an RFC with more details about what will be the final API but pinning versions & syncing with changes should work for now. Excited to be working with y'all on this! |
Excited about the job metadata spec and the RFC! Those will be a great steps forward. I’ve open sourced the https://github.com/netlify/gatsby-parallel-runner It’s perhaps a shame that the parallel capabilities of Gatsby will be developed in private for now, outside of the open-source community, but I can obviously understand the business reasons behind it! Since we’re already seeing great results from the parallel runner approach, we’ll keep building on this in the open from our side. Hopefully plugin authors outside of Gatsby, Inc can benefit a lot from having a framework that allows them to experiment with parallelization and contribute potential improvements and capabilities to the core parallel runtime, even if the final implementation will still be very much in flux. Looking forward to continuing to work together on a faster web! |
Description
gatsby-parallel-runner
is a Gatsby build runtime that allows plugins and core parts of Gatsby to take advantage of the concept of external jobs with ipc introduced in #20835When gatsby is executed with
gatsby-parallel-runner
instead of withgatsby build
, it will be wrapped in a parent runtime that can process external jobs.The plugin currently includes support for the
IMAGE_PROCESSING
job emitted bygatsby-plugin-sharp
and includes a Google Cloud based runner that can parallelize this task to Google Cloud Functions and offers clear performance benefits for image heavy sites. On the default image benchmark site show initial tests shows more than 4 time speedup for the image processing part of the build.The plugin is built around a set of abstractions that makes it viable to add additional parallel runtimes like an AWS Lambda based runtime would be a great example, and is structured to make it easy for plugin authors to add additional cloud processors for new tasks.
I believe having a plugin like this in Gatsby core (rather than purely in an external plugin registry) will be important to allow plugin authors from the open source ecosystem to develop plugins that can take advantage of the external job system. Allowing plugin authors or developers working with site specific local plugins to build and test jobs that can be trivially parallelized via serverless functions will greatly benefit the whole community.
Documentation
Install in your gatsby project:
To use with Google Cloud, set relevant env variables in your shell:
Deploy the cloud function:
Then run your Gatsby build with the parallel runner instead of the default
gatsby build
command.Related Issues
#19831