feat(gatsby): Schema rebuilding #19092

vladar · 2019-10-28T20:19:33Z

Description

Enables schema rebuilding so that gatsby develop restart is not required when nodes change. It should also help with incremental builds.

This whole feature is related to types and fields created via inference. Types and fields created via schema customization are considered static and shouldn't rebuild.

TODO

Use cases supported in `develop`

Adding new node types
Adding nodes with new or structurally modified fields
Modifying nodes (adding new fields, changing field types, etc.)
Deleting fields from node types
Deleting of a type when there are no nodes or fields that could produce it in the first place

In these cases, schema rebuilds without restarting. Non-structural node changes do not trigger a rebuild (i.e., nodes added with the same structure)

Inference Refactoring

These changes required refactoring of the inference process (or more specifically - getExampleValue).

Before this change, getExampleValue was looping through all the nodes of a given type to construct an example value, which was later utilized by type inference. This approach doesn't play well with incremental schema rebuilding because it requires O(N*M) to create example value (where N is the number of node fields, including nested fields, and M is the number of nodes of the given type)

After this PR getExampleValue uses node metadata stored in the redux store and updated on every node-related action. Initial metadata building during bootstrap is the same O(N*M) but getExampleValue is now O(N) which makes it reasonably fast for incremental re-building of the example value.

Caveat: conflict tracking

Conflict tracking for arrays is tricky, i.e.: { a: [5, "foo"] } and { a: [5] }, { a: ["foo"] }
are represented identically in metadata and reported identically. To workaround it we additionally track first NodeId for a type:

{ a: { array: { item: { int: { total: 1, first: "1" }, string: { total: 1, first: "1" } }}
{ a: { array: { item: { int: { total: 1, first: "1" }, string: { total: 1, first: "2" } }}

This helps producing more useful conflict reports (still rare edge cases possible when reporting may be confusing, i.e. when node is deleted)

Caveat: dirty checking

Some plugins will delete nodes and then re-create on any change. So even if final metadata is identical it will be still marked as dirty. We additionally compare dirty metadata between calls (and skip rebuilding if the inferred structure is the same)

Caveat: rebuild granularity

Currently, we rebuild full schema instance from scratch (only when there are some structural changes). Granular updates are complicated as graphql-compose wasn't designed for mutations like this: it is great for complex schema builds but provides little help when you delete or modify types / fields / arguments.

Possible follow-ups

Rebuild granularity (see caveat above). This will likely require some coordination with graphql-compose author or our own type/field/arg dependency tracking. But the complexity involved might not worth the effort.
We could use metadata directly for inference (vs. example value). Metadata has more information for inference or granular rebuilds. For example, we could handle data conflicts more gently. Say when < 1% of nodes have a conflicting field type - add the majority field and warn about specific conflicting node id (currently we remove a field on conflicts and report both nodes)
Investigate webpack invalidation on schema change, see feat(gatsby): Schema rebuilding #19092 (review)

Related Issues

Fixes #18939

freiksenet

This is so cool! 👍

I think you are missing child/parent relationship cases, so when child/parent relationship is added either through action or by setting 'parent' field.

We usually test against .org (www folder in monorepo) using gatsby-dev-cli. You can test against multiple sites using develop-runner. You need to modify source code a bit, but it's still very handy.

We often publish a pre-release version for packages with big changes, then you can use it to test with develop-runner. Eg you can reuse tag @schema-customization. (To do it, go to gatsby/packages/gatsby and do yarn publish --tag schema-customization).

freiksenet · 2019-10-29T07:01:49Z

packages/gatsby/src/utils/node-descriptor.js

@@ -0,0 +1,346 @@
+/*


pieh · 2019-10-29T13:27:56Z

packages/gatsby/src/bootstrap/schema-hot-reloader.js

+const report = require(`gatsby-cli/lib/reporter`)
+
+// API_RUNNING_QUEUE_EMPTY could be emitted multiple types
+// in a short period of time, so debounce seems reasonable


This is true and probably something we should just look into to debounce API_RUNNING_QUEUE_EMPTY event elsewhere (most systems that listen to that are also potentially expensive operations, so having global debounce would help)

Oops, missed this comment, sorry. It makes sense. But I suggest doing this in a separate PR

pieh · 2019-10-29T13:35:55Z

packages/gatsby/src/utils/node-descriptor.js

+  //   bar: {
+  //     string: { total: 1, example: 'str' },
+  //   },
+  // }


How are nested objects represented here? How arrays are represented?

I.e.
what would

const node2 = { id: '1', nested: { foo: 'bar' }, array: [1, 2, 3, "string"] }

produce

(if this is handled already)

Ah there are types below - ignore above ;)

Yeah, those are just simple usage examples, didn't want to distract with all the recursion stuff here %)

packages/gatsby/src/schema/infer/inference-metadata.js

wardpeet

I've only looked at the webpack plugin and it looks great! 👏 (well done!)
if we want to invalidate the webpack config when the schema changes we need to do some more hackery 😂

gatsby/packages/gatsby/src/commands/develop.js

Lines 220 to 226 in b236461

    
           app.use( 
        
             require(`webpack-dev-middleware`)(compiler, { 
        
               logLevel: `silent`, 
        
               publicPath: devConfig.output.publicPath, 
        
               stats: `errors-only`, 
        
             }) 
        
           )

The webpack-dev-middleware returns an instance that has an invalidate method. This method tells webpack to invalidate itself which reruns eslint on all watched files.

I'm thinking we should do something like

  const webpackDevMiddleware = require(`webpack-dev-middleware`)(compiler, {
    logLevel: `silent`,
    publicPath: devConfig.output.publicPath,
    stats: `errors-only`,
  });

  app.use(webpackDevMiddleware)
  
  // this should only be triggered when it actually has changed.
  emitter.on('SCHEMA_REBUILD', () => {
    webpackDevMiddleware.invalidate()
  })

packages/gatsby/src/utils/webpack-utils.js

packages/gatsby/src/utils/gatsby-webpack-eslint-config-reload.js

freiksenet

LGTM 👍

…s later)

vladar · 2019-11-14T14:00:34Z

@wardpeet I tried doing webpack invalidation but for some reason, eslint-loader is not running after it. For now, we re-run queries here:

gatsby/packages/gatsby/src/bootstrap/schema-hot-reloader.js

Lines 29 to 30 in 55ca80e

    
           await rebuild({ parentSpan: activity }) 
        
           await updateStateAndRunQueries(false, { parentSpan: activity })

which will fail with error anyway. I will add invalidation to possible follow-ups and will probably need your help to debug why it is not working as expected.

wardpeet · 2019-11-14T14:01:43Z

Sweet! Sounds great, I'm not 100% sure my pseudo-code was the right call. Happy to debug later :)

…refresh)

vladar · 2019-11-19T08:37:37Z

Published in gatsby 2.18.0

aileen · 2019-11-25T04:20:55Z

Hey @vladar 👋🏼
I have a question re this task:

Test with multiple medium to big size projects

Has this been done actually? I'm asking because the jump to Gatsby@2.18.0 significantly slowed down the development process for us:

With Gastby@2.18.0 after changing some markup in a page, createPages takes almost 30s:

success Building development bundle - 22.864s
info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.007s
success createPages - 28.390s
success run queries - 0.104s - 0/1 9.58/s
success extract queries from components - 2.147s
success write out requires - 0.007s
success Re-building development bundle - 3.067s
success run queries - 4.639s - 385/385 82.99/s

With Gatsby@2.17.17 after changing some markup in a page, createPages takes 4s:

success Building development bundle - 21.763s
info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.006s
success createPages - 3.962s
success run queries - 0.105s - 0/1 9.54/s
success extract queries from components - 2.090s
success write out requires - 0.004s
success Re-building development bundle - 3.149s
success run queries - 4.879s - 385/385 78.90/s
success run queries - 0.078s - 10/10 127.74/s

Is this known? Do you want me to open an issue for that? For now, we'll have to stick using the pre 2.18 version, as developing is kinda unbearable otherwise.

vladar · 2019-11-25T07:31:02Z

There are two known performance regressions. One in 2.18.0 after schema rebuilding and another one in 2.18.1 after #17681

I guess we need to figure out which one affects you the most. Could you maybe try the exact version 2.18.0 and 2.18.1 and compare results?

aileen · 2019-11-26T05:11:30Z

Oh!! Good call! Sorry, for not being accurate enough 😬

Seems like the longer rebuilding time is actually caused by 2.18.1 👇🏼

With Gatsby@2.18.0 (~3.2s):

info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.003s
success createPages - 3.158s
success run queries - 0.103s - 1/1 9.69/s
success extract queries from components - 1.753s
success write out requires - 0.006s
success Re-building development bundle - 2.537s
success run queries - 4.605s - 385/385 83.60/s
success run queries - 0.060s - 10/10 165.95/s

With Gatsby@2.18.1 (~21.8s):

info added file at /Users/aileen/code/G3/src/pages/members.js
success extract queries from components - 0.201s
success write out requires - 0.006s
success createPages - 21.790s
success run queries - 0.118s - 1/1 8.46/s
success write out requires - 0.004s
success Re-building development bundle - 22.950s
success run queries - 4.647s - 385/385 82.84/s

vladar · 2019-11-27T12:41:25Z

@AileenCGN Potential fix for this regression is published in gatsby 2.18.4 (see PR #19774). Could you try again and maybe post here if it improves things for you?

aileen · 2019-11-28T05:43:56Z

@vladar Awesome!! It's fixed now 🎉 Back to old rebuild times now 🤗

tu4mo · 2019-12-11T10:50:50Z

Hey @vladar, this seems to have somehow broken gatsby-transform-react-docgen, see #20043.

vladar requested a review from freiksenet October 28, 2019 20:21

sidharthachatterjee added the status: WIP label Oct 28, 2019

freiksenet reviewed Oct 29, 2019

View reviewed changes

packages/gatsby/src/utils/node-descriptor.js Outdated

@@ -0,0 +1,346 @@

/*

Copy link

Contributor

freiksenet Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

pieh reviewed Oct 29, 2019

View reviewed changes

muescha mentioned this pull request Nov 8, 2019

docs(setting-up-your-local-dev-environment): beautify some parts #19349

Merged

muescha reviewed Nov 8, 2019

View reviewed changes

packages/gatsby/src/schema/infer/inference-metadata.js Outdated Show resolved Hide resolved

muescha mentioned this pull request Nov 8, 2019

dev(core): add note note about developer-runner #19380

Closed

vladar force-pushed the vladar-rebuild-schema branch from 0149d75 to 0607c9e Compare November 12, 2019 13:51

vladar changed the title ~~WIP: schema rebuilding~~ feat(gatsby): Schema rebuilding Nov 13, 2019

vladar marked this pull request as ready for review November 13, 2019 07:43

vladar requested review from a team as code owners November 13, 2019 07:43

sidharthachatterjee removed the status: WIP label Nov 13, 2019

wardpeet reviewed Nov 13, 2019

View reviewed changes

packages/gatsby/src/utils/webpack-utils.js Outdated Show resolved Hide resolved

packages/gatsby/src/utils/webpack-utils.js Outdated Show resolved Hide resolved

packages/gatsby/src/utils/gatsby-webpack-eslint-config-reload.js Outdated Show resolved Hide resolved

freiksenet previously approved these changes Nov 13, 2019

View reviewed changes

sidharthachatterjee added the minor release label Nov 13, 2019

vladar added 13 commits November 14, 2019 12:30

Refactor example-value to support fast incremental rebuilding

c795e47

Detect node structure changes incrementally (to avoid expensive check…

cefa829

…s later)

Track metadata for inference in the redux

62687b8

New rebuildWithTypes API for incremental schema rebuilding

bfde835

Schema hot reloading for develop

6fa5089

Move things around for better cohesion

5d17926

Replace old getExampleValue with new APIs based on inference metadata

b703768

Cleanup / rename things for consistency

2194c98

Proper handling of ADD_FIELD_TO_NODE action

d0de33b

Make sure stale inferred fields are removed from the schema

6df4358

More tests to and related fixes to conflict reporting

1c697a5

Clear derived TypeComposers and InputTypeComposers on type rebuild

066cca3

More tests for schema rebuilding

2df65cd

Add new tests to clarify updating of ___NODE fields

55ca80e

vladar added 5 commits November 14, 2019 22:42

Be a bit more defensive with haveEqualFields args

b66bbe7

Re-run schema customization on __refresh

8801594

Rebuild schema on schema customization in develop (i.e. called via __…

fa8b8f6

…refresh)

Add support for node update

ed356ce

Fix rebuildWithSitePage extensions

f756d9e

freiksenet approved these changes Nov 18, 2019

View reviewed changes

vladar added the bot: merge on green Gatsbot will merge these PRs automatically when all tests passes label Nov 19, 2019

gatsbybot merged commit e4dae4d into master Nov 19, 2019

delete-merged-branch bot deleted the vladar-rebuild-schema branch November 19, 2019 08:12

bsonntag mentioned this pull request Nov 19, 2019

Single reference fields that accept multiple content types don't work in GraphQL #10090

Closed

pieh mentioned this pull request Nov 20, 2019

Delete nodes that no longer exist from the lastest source fetch #18802

Closed

laurenskling mentioned this pull request Nov 21, 2019

UNHANDLED REJECTION Schema must contain uniquely named types but contains multiple types named "File". While using createRemoteFileNode in createResolvers #19683

Closed

wKovacs64 mentioned this pull request Nov 22, 2019

Broken with Gatsby 2.18.0 graysonhicks/gatsby-plugin-remote-images#28

Closed

jbolda mentioned this pull request Nov 23, 2019

Update gatsby monorepo jbolda/gatsby-source-airtable#117

Merged

1 task

vladar mentioned this pull request Nov 25, 2019

perf(gatsby): Avoid unnecessary type inference during bootstrap #19781

Merged

laradevitt mentioned this pull request Nov 27, 2019

After npm update, suddenly getting GRAPHQL ERROR - "featuredimage" must not have a selection since type "String" has no subfields #19748

Closed

tu4mo mentioned this pull request Dec 10, 2019

gatsby-transformer-react-docgen fails on hot reloads #20043

Closed

vladar mentioned this pull request Dec 12, 2019

fix(gatsby-transformer-react-docgen): support schema rebuilding #20095

Merged

pristas-peter mentioned this pull request Jan 9, 2020

Epic GatsbyWPGutenberg/gatsby-wordpress-gutenberg#1

Closed

vladar mentioned this pull request Apr 10, 2020

fix(gatsby): call schema rebuild manually on __refresh #23009

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gatsby): Schema rebuilding #19092

feat(gatsby): Schema rebuilding #19092

vladar commented Oct 28, 2019 •

edited

Loading

freiksenet left a comment •

edited

Loading

freiksenet Oct 29, 2019

pieh Oct 29, 2019

vladar Nov 5, 2019

pieh Oct 29, 2019

pieh Oct 29, 2019

vladar Oct 30, 2019

wardpeet left a comment •

edited

Loading

freiksenet left a comment

vladar commented Nov 14, 2019

wardpeet commented Nov 14, 2019

vladar commented Nov 19, 2019

aileen commented Nov 25, 2019

vladar commented Nov 25, 2019 •

edited

Loading

aileen commented Nov 26, 2019

vladar commented Nov 27, 2019

aileen commented Nov 28, 2019

tu4mo commented Dec 11, 2019

	app.use(
	require(`webpack-dev-middleware`)(compiler, {
	logLevel: `silent`,
	publicPath: devConfig.output.publicPath,
	stats: `errors-only`,
	})
	)

feat(gatsby): Schema rebuilding #19092

feat(gatsby): Schema rebuilding #19092

Conversation

vladar commented Oct 28, 2019 • edited Loading

Description

TODO

Use cases supported in develop

Inference Refactoring

Caveat: conflict tracking

Caveat: dirty checking

Caveat: rebuild granularity

Possible follow-ups

Related Issues

freiksenet left a comment • edited Loading

Choose a reason for hiding this comment

freiksenet Oct 29, 2019

Choose a reason for hiding this comment

pieh Oct 29, 2019

Choose a reason for hiding this comment

vladar Nov 5, 2019

Choose a reason for hiding this comment

pieh Oct 29, 2019

Choose a reason for hiding this comment

pieh Oct 29, 2019

Choose a reason for hiding this comment

vladar Oct 30, 2019

Choose a reason for hiding this comment

wardpeet left a comment • edited Loading

Choose a reason for hiding this comment

freiksenet left a comment

Choose a reason for hiding this comment

vladar commented Nov 14, 2019

wardpeet commented Nov 14, 2019

vladar commented Nov 19, 2019

aileen commented Nov 25, 2019

vladar commented Nov 25, 2019 • edited Loading

aileen commented Nov 26, 2019

vladar commented Nov 27, 2019

aileen commented Nov 28, 2019

tu4mo commented Dec 11, 2019

vladar commented Oct 28, 2019 •

edited

Loading

Use cases supported in `develop`

freiksenet left a comment •

edited

Loading

wardpeet left a comment •

edited

Loading

vladar commented Nov 25, 2019 •

edited

Loading