[WIP] feat(gatsby): worker-pool for custom GraphQL field resolvers #10938

Moocar · 2019-01-09T07:47:18Z

WIP: Don't merge

I'd love some feedback on this PR before I finish it off. Read on for an overview of what's going on

Summary

This PR adds the ability for GraphQL field resolvers to be executed in a pool of child node.js processes, thus offloading work to more than 1 core.

This is a partial answer to #8400 (this PR implements field resolvers rather than entire queries).

Motivation

For larger sites, the execution of GraphQL queries takes up the majority of build time. GraphQL first filters down to a set of nodes that satisfy the GraphQL query, then it calls the resolve function on the fields on the body of the query.

Most of these resolve functions simply return a property of the node in question. Since this operation is so fast and relies on the ability to query a large set of nodes in memory, it doesn't make sense to run these resolvers in another process.

Resolvers that are declared by plugins via the setFieldsOnGraphQLNodeType node API often end up performing CPU intensive tasks like generating images, or parsing markdown. By offloading these functions to other cores, we can potentially speed up the build.

How it works

To specify that a field resolver should be run in parallel, plugin authors add a workerPlugin property. This signifies that the field resolver is fully async and is safe to be executed in a worker
During schema generation, in setFieldsOnGraphQLNodeType, we find all workerPlugin fields and store them.
Before running queries, we create a pool of jest-workers. Each worker is supplied with a list of workerPlugin fields. Each worker will:
1. load all the field's plugins gatsby-node.js and call setFieldsOnGraphQLNodeType() with an API matching what would normally be passed from api-runner-node (e.g getNode), except that each API function is reimplemented as an RPC that communicates over IPC back to the master process where it will be fulfilled.
2. expose a function execResolver that will call the resolver returned in the previous step.
Redefine field resolvers to call the worker pool execResolver function
When gatsby runs the query phase, GraphQL will call the resolver, which will call the worker, which will execute the resolver on that core, and then return a response.

Considerations/Things to Know

This works very effectively on resolvers that perform CPU intensive tasks
Inter-process Communication (IPC) between processes can be expensive since messages need to be converted to and from JSON before being sent over the pipe. So resolvers that take large nodes or return large responses aren't as effective on workers. In fact, if CPU is low, and node/response are large, it can actually be slower to place the work on another core.
sharp uses libvips which can already use all cores. So even though sharp is highly CPU intensive and writes to disk rather than returning a large response, there's no point in running it as a resolver worker
I've only enabled worker resolvers on transformer-remark, since it's a good candidate. On the markdown benchmark, I see a 2x performance increase in the query phase on my 4-core laptop.
I've used jest-worker since it includes queueing, pooling and sharding. However I had to make a change to enable IPC communication. You can see the changes here. Assuming we decide to go ahead with the above, I'll submit a PR to jest. In the meantime, I've published my own @moocar/jest-worker
I looked briefly at memory-mapping (e.g mmap-io) to share immutable data across workers but it did my head in.

TODO/Questions/Specific Feedback requests

still need to write a docs guide
I'm not happy with the workerPlugin property. Other ideas on how to automatically figure out the plugin name would be appreciated.
Handle timeouts in RPCs. It shouldn't really happen since these are only pipes, not network boundaries, but we should still handle it.
Potentially make actions (e.g createNode) into RPCs as well. I've re-implemented them to throw an unsupported error right now.
Figure out if we need to call initPool() again after updating schema in bootstrap (since schema is built twice)
the cache needs some thinking. I've used jest-worker's computeWorkerKey to ensure that nodes are always sent to the same worker, thus allowing them to take advantage of local caches. But open to better solutions.
Tests! (yay)
please run this on your machine, especially if you have > 4 cores. A great test is the markdown benchmark. I see a 2x speed up on my 4-core machine.
gatsby-plugin-sharp cheats and loads redux actions directly as opposed to having them passed. This means it will be using its own redux store per process. Need to fix.

markdown $ pwd
/Users/amarcar/dev/gatsby/gatsby/benchmarks/markdown
markdown $ rm -rf .cache && NUM_PAGES=1000 NUM_ROWS=1000 gatsby build

Future

I'm excited to see if this work can enable other parts of Gatsby to run in parallel. It will require a bunch more of gatsby to become asynchronous. Fun times.

- and documentation

Moocar · 2019-01-10T22:41:32Z

I had a great chat with @KyleAMathews yesterday about some alternative approaches to this. The core idea is to give plugin resolvers a "worker API" that they can explicitly call to offload work to another process, rather than running ALL resolvers on workers. I'm going to close this PR while I think about this idea.

Moocar · 2019-01-11T02:46:31Z

An interesting finding is that when using workers, it's important to bump up the query-queue.js concurrency so that more jobs can be queued up on workers. When concurrency == num cores, then time is spent idle on workers while the main thread processes responses before sending out the next job. Illustrated below:

concurrency == 4:

concurrency == 32:

Moocar · 2019-02-07T02:38:12Z

Ok, I'm re-opening this. I've done some more research on parallel resolvers and looked at alternative implementations. I can break my findings into two areas:

1. We should feature flag it

The upside of parallel resolvers is clear. Those that operate on small nodes, perform lots of CPU work, and the return small responses, will see a near linear increase in query performance per core.

The reality however, is that with the exception of image processing (which already uses all cores), there are no resolvers that meet this criteria. The closest is transformer-remark. But if a markdown node needs lots of CPU processing, then the node/result will also both be large. So we'll be taxed by the serialized to/from JSON required by IPC.

Still, we do see a speed up. As mentioned in the PR description, a 2x speed up in query performance on benchmarks/markdown on a 4-core machine. I ran a test on a 32-core machine and saw an 8x speed up. That's a lot of wasted CPU processing on serialization, but it is a decent speed up.

But in comparison, when running the test on real websites like www, or gatsbygram, I don't see any noticeable speed up on my 4-core machine.

So my recommendation is that we feature flag this. That way if someone knows that their site might benefit, or they're running on a 32-core rig, they can run builds with parallelism. But we won't be accidentally making some sites slower.

2. We shouldn't use a "worker API"

The implementation in this PR runs entire resolvers on a worker farm. All the plugin author has to do is set the workerPlugin field on the resolver. After a chat with @KyleAMathews, we came up with alternative approach where resolvers would explicitly offload high CPU work to a worker farm via an api supplied by Gatsby. I've created a PoC of this that you can see at Moocar/worker-api (comparison).

At a glance, this approach makes more sense, since plugin code might know that certain nodes would benefit from being run on a worker farm, while others may not. But there are a number of downsides:

In reality, it's hard for many plugins to know whether a node is better off being run on a worker. E.g for transformer-remark, is it a node with lots of content? Or maybe with lots of code blocks? Or maybe the list of sub plugins (e.g remark-code-titles) affects it? Writing adhoc code to answer this is hard. It's really a job for machine learning to figure out at scale.
This PR uses jests's computeWorkerKey to always send work for a particular node to the same worker. This won't be possible in a worker API since we no longer have context about the node.
In setFieldsOnGraphQLNodeType, we close over gatsby API functions (e.g getNode). But in a worker API, we'd need to instead pass these in a context to each function. Which would result in a rewrite of most plugin code (see remark/worker.js for example).
Plugin code has to be rewritten so that high CPU functions are in a separate file (which will be read by jest-worker). Whereas the solution in this PR simply requires a workerPlugin field be added to the graphql field config.

Next Steps/Asks

See original PR asks. This PR isn't ready to be merged, but hopefully I can get some feedback on the rough approach.

freiksenet · 2019-02-07T06:29:34Z

packages/gatsby-transformer-remark/src/extend-node-type.js

@@ -403,12 +408,14 @@ module.exports = (
    return resolve({
      html: {
        type: GraphQLString,
+        workerPlugin,


Not sure I like that. This will break graphql-js/graphql-compose typings for people who use typescript. Could we do smth like

const workerPlugin = new GatsbyWorkerPlugin(`gatsby-transformer-remark`) { htmlAst: { type: GraphQLJSON, resolve: workerPlugin.wrapResolver((markdownNode) => { // ... }, } }

Love this idea. It's way more explicit rather than flagging and picking it up later in the process. Thanks!

Hmm, the implementation to get something like wrapResolver is gonna be pretty hacky. But it will work. Does GraphQL have any other way of specifying "tags" on Field objects? I essentially want to attach metadata to a field.

freiksenet · 2019-02-07T06:31:02Z

packages/gatsby/src/schema/resolver-worker.js

+  return o
+}
+
+/**


My branch makes it so that all resolvers use stuff passed to context, instead of global ones. So here you could modify context to have the worker resolver stuff.

@freiksenet This isn't a breaking change right? I.e will field resolvers still be able to use the closed over getNode etc supplied by setFieldsOnGraphQLNodeType in addition to those supplied in the context?

I just had a read through of the branch. It looks the only api functions injected into context are nodeModel, which will supply getNode etc. Is there also a plan to include things like getCache, reporter, createContentDigest etc as well?

Moocar · 2019-02-17T22:24:59Z

Pausing this while I figure out how to make it work with the latest plugin-sharp changes: #10964

Moocar added 23 commits December 18, 2018 18:27

first shot at parallel workers

f5c4334

stuff

f4d816c

async getNode

7b3fca9

asyncPlugin

25a524d

fix pluginFields

e6beb71

use jest-worker for IPC

aef1e43

remove console.logs

8596eda

jest workers work!

e7cbeac

Merge branch 'master' into parallel-sketch

6863ca8

store resolvers against their type, not just fieldName

242420c

fixed up rpc api

a7559d5

replace isAsync with workerPlugin

fa5ec9f

document jest-worker IPC code 3 in async/worker code

3cf872e

change async-resolvers -> worker-resolvers

1508298

- and documentation

error checking

a889664

resolver-worker docs and error checking

38e0370

docs

70e7aaa

use @moocar/jest-worker

c5655ec

yarn.lock for @moocar/jest-worker

a77c1d4

end worker pool after bootstrap

7a86ee1

Merge branch 'master' into parallel-sketch

f629eaa

introduce getCache mocker

6767039

fix unsupported properties, and general cleanup

d50efb2

Moocar requested a review from a team as a code owner January 9, 2019 07:47

pieh added the status: WIP label Jan 9, 2019

Moocar closed this Jan 10, 2019

use plugin.name for cache in resolver-worker

21d5aba

Moocar added 4 commits January 21, 2019 12:18

Merge branch 'master' into parallel-sketch

98fbec2

Merge branch 'master' into parallel-sketch

15099d3

Merge branch 'master' into parallel-sketch

61b0168

step.toString() in markdown benchmark

0aca069

Moocar reopened this Feb 7, 2019

freiksenet reviewed Feb 7, 2019

View reviewed changes

Merge branch 'master' into parallel-sketch

21bfc69

Moocar closed this Feb 17, 2019

MaxMood96 mentioned this pull request Jun 22, 2022

[Snyk] Security upgrade eslint from 4.19.1 to 5.7.0 MaxMood96/gatsby#504

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat(gatsby): worker-pool for custom GraphQL field resolvers #10938

[WIP] feat(gatsby): worker-pool for custom GraphQL field resolvers #10938

Moocar commented Jan 9, 2019 •

edited

Loading

Moocar commented Jan 10, 2019

Moocar commented Jan 11, 2019 •

edited

Loading

Moocar commented Feb 7, 2019

freiksenet Feb 7, 2019

Moocar Feb 7, 2019

Moocar Feb 8, 2019

freiksenet Feb 7, 2019

Moocar Feb 7, 2019 •

edited

Loading

Moocar Feb 8, 2019

Moocar commented Feb 17, 2019

[WIP] feat(gatsby): worker-pool for custom GraphQL field resolvers #10938

[WIP] feat(gatsby): worker-pool for custom GraphQL field resolvers #10938

Conversation

Moocar commented Jan 9, 2019 • edited Loading

Summary

Motivation

How it works

Considerations/Things to Know

TODO/Questions/Specific Feedback requests

Future

Moocar commented Jan 10, 2019

Moocar commented Jan 11, 2019 • edited Loading

Moocar commented Feb 7, 2019

1. We should feature flag it

2. We shouldn't use a "worker API"

Next Steps/Asks

freiksenet Feb 7, 2019

Choose a reason for hiding this comment

Moocar Feb 7, 2019

Choose a reason for hiding this comment

Moocar Feb 8, 2019

Choose a reason for hiding this comment

freiksenet Feb 7, 2019

Choose a reason for hiding this comment

Moocar Feb 7, 2019 • edited Loading

Choose a reason for hiding this comment

Moocar Feb 8, 2019

Choose a reason for hiding this comment

Moocar commented Feb 17, 2019

Moocar commented Jan 9, 2019 •

edited

Loading

Moocar commented Jan 11, 2019 •

edited

Loading

Moocar Feb 7, 2019 •

edited

Loading