-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] feat(gatsby): worker-pool for custom GraphQL field resolvers #10938
Conversation
- and documentation
I had a great chat with @KyleAMathews yesterday about some alternative approaches to this. The core idea is to give plugin resolvers a "worker API" that they can explicitly call to offload work to another process, rather than running ALL resolvers on workers. I'm going to close this PR while I think about this idea. |
An interesting finding is that when using workers, it's important to bump up the
|
Ok, I'm re-opening this. I've done some more research on parallel resolvers and looked at alternative implementations. I can break my findings into two areas: 1. We should feature flag itThe upside of parallel resolvers is clear. Those that operate on small nodes, perform lots of CPU work, and the return small responses, will see a near linear increase in query performance per core. The reality however, is that with the exception of image processing (which already uses all cores), there are no resolvers that meet this criteria. The closest is Still, we do see a speed up. As mentioned in the PR description, a 2x speed up in query performance on But in comparison, when running the test on real websites like So my recommendation is that we feature flag this. That way if someone knows that their site might benefit, or they're running on a 32-core rig, they can run builds with parallelism. But we won't be accidentally making some sites slower. 2. We shouldn't use a "worker API"The implementation in this PR runs entire resolvers on a worker farm. All the plugin author has to do is set the At a glance, this approach makes more sense, since plugin code might know that certain nodes would benefit from being run on a worker farm, while others may not. But there are a number of downsides:
Next Steps/AsksSee original PR asks. This PR isn't ready to be merged, but hopefully I can get some feedback on the rough approach. |
@@ -403,12 +408,14 @@ module.exports = ( | |||
return resolve({ | |||
html: { | |||
type: GraphQLString, | |||
workerPlugin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I like that. This will break graphql-js/graphql-compose typings for people who use typescript. Could we do smth like
const workerPlugin = new GatsbyWorkerPlugin(`gatsby-transformer-remark`)
{
htmlAst: {
type: GraphQLJSON,
resolve: workerPlugin.wrapResolver((markdownNode) => {
// ...
},
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this idea. It's way more explicit rather than flagging and picking it up later in the process. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the implementation to get something like wrapResolver
is gonna be pretty hacky. But it will work. Does GraphQL have any other way of specifying "tags" on Field objects? I essentially want to attach metadata to a field.
return o | ||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My branch makes it so that all resolvers use stuff passed to context, instead of global ones. So here you could modify context to have the worker resolver stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freiksenet This isn't a breaking change right? I.e will field resolvers still be able to use the closed over getNode
etc supplied by setFieldsOnGraphQLNodeType
in addition to those supplied in the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a read through of the branch. It looks the only api functions injected into context are nodeModel
, which will supply getNode
etc. Is there also a plan to include things like getCache
, reporter
, createContentDigest
etc as well?
Pausing this while I figure out how to make it work with the latest plugin-sharp changes: #10964 |
WIP: Don't merge
I'd love some feedback on this PR before I finish it off. Read on for an overview of what's going on
Summary
This PR adds the ability for GraphQL field resolvers to be executed in a pool of child node.js processes, thus offloading work to more than 1 core.
This is a partial answer to #8400 (this PR implements field resolvers rather than entire queries).
Motivation
For larger sites, the execution of GraphQL queries takes up the majority of build time. GraphQL first filters down to a set of nodes that satisfy the GraphQL query, then it calls the
resolve
function on the fields on the body of the query.Most of these resolve functions simply return a property of the node in question. Since this operation is so fast and relies on the ability to query a large set of nodes in memory, it doesn't make sense to run these resolvers in another process.
Resolvers that are declared by plugins via the
setFieldsOnGraphQLNodeType
node API often end up performing CPU intensive tasks like generating images, or parsing markdown. By offloading these functions to other cores, we can potentially speed up the build.How it works
workerPlugin
property. This signifies that the field resolver is fully async and is safe to be executed in a workersetFieldsOnGraphQLNodeType
, we find allworkerPlugin
fields and store them.gatsby-node.js
and callsetFieldsOnGraphQLNodeType()
with an API matching what would normally be passed fromapi-runner-node
(e.ggetNode
), except that each API function is reimplemented as an RPC that communicates over IPC back to the master process where it will be fulfilled.execResolver
that will call the resolver returned in the previous step.execResolver
functionConsiderations/Things to Know
sharp
uses libvips which can already use all cores. So even thoughsharp
is highly CPU intensive and writes to disk rather than returning a large response, there's no point in running it as a resolver workertransformer-remark
, since it's a good candidate. On themarkdown
benchmark, I see a 2x performance increase in the query phase on my 4-core laptop.jest
. In the meantime, I've published my own @moocar/jest-workerTODO/Questions/Specific Feedback requests
workerPlugin
property. Other ideas on how to automatically figure out the plugin name would be appreciated.actions
(e.gcreateNode
) into RPCs as well. I've re-implemented them to throw an unsupported error right now.initPool()
again after updating schema in bootstrap (since schema is built twice)cache
needs some thinking. I've used jest-worker'scomputeWorkerKey
to ensure that nodes are always sent to the same worker, thus allowing them to take advantage of local caches. But open to better solutions.markdown
benchmark. I see a 2x speed up on my 4-core machine.gatsby-plugin-sharp
cheats and loads redux actions directly as opposed to having them passed. This means it will be using its own redux store per process. Need to fix.Future
I'm excited to see if this work can enable other parts of Gatsby to run in parallel. It will require a bunch more of gatsby to become asynchronous. Fun times.