Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby): Schema rebuilding #19092

Merged
merged 54 commits into from
Nov 19, 2019
Merged

feat(gatsby): Schema rebuilding #19092

merged 54 commits into from
Nov 19, 2019

Conversation

vladar
Copy link
Contributor

@vladar vladar commented Oct 28, 2019

Description

Enables schema rebuilding so that gatsby develop restart is not required when nodes change. It should also help with incremental builds.

This whole feature is related to types and fields created via inference. Types and fields created via schema customization are considered static and shouldn't rebuild.

TODO

  • Initial PoC
  • Remove stale inferred fields
  • Handle ADD_FIELD_TO_NODE in the reducer
  • Rework existing tests to use new APIs (utilizing metadata vs. list of nodes for inferring)
  • Derived types (filter/sort types, nested inferred types, etc.)
  • Root fields / arguments
  • Handle implicit parent-child relations and ADD_CHILD_NODE_TO_PARENT_NODE
  • Updating extensions for node and derived types
  • New tests for this specific feature (relied on existing tests so far which are green)
  • Make sure types and fields created via schema customization are not rebuilt
  • Move schema customization before node sourcing (to skip inference for types marked with @dontInfer)
  • Parse type defs before dispatching CREATE_TYPES action
  • Fix eslint to pick up new schema after rebuild
  • Re-run queries after rebuild (as they may be broken after schema change)
  • Improve dirty-checks to minimize false-positives
  • Test with multiple medium to big size projects

Use cases supported in develop

  1. Adding new node types
  2. Adding nodes with new or structurally modified fields
  3. Modifying nodes (adding new fields, changing field types, etc.)
  4. Deleting fields from node types
  5. Deleting of a type when there are no nodes or fields that could produce it in the first place

In these cases, schema rebuilds without restarting. Non-structural node changes do not trigger a rebuild (i.e., nodes added with the same structure)

Inference Refactoring

These changes required refactoring of the inference process (or more specifically - getExampleValue).

Before this change, getExampleValue was looping through all the nodes of a given type to construct an example value, which was later utilized by type inference. This approach doesn't play well with incremental schema rebuilding because it requires O(N*M) to create example value (where N is the number of node fields, including nested fields, and M is the number of nodes of the given type)

After this PR getExampleValue uses node metadata stored in the redux store and updated on every node-related action. Initial metadata building during bootstrap is the same O(N*M) but getExampleValue is now O(N) which makes it reasonably fast for incremental re-building of the example value.

Caveat: conflict tracking

Conflict tracking for arrays is tricky, i.e.: { a: [5, "foo"] } and { a: [5] }, { a: ["foo"] }
are represented identically in metadata and reported identically. To workaround it we additionally track first NodeId for a type:

{ a: { array: { item: { int: { total: 1, first: "1" }, string: { total: 1, first: "1" } }}
{ a: { array: { item: { int: { total: 1, first: "1" }, string: { total: 1, first: "2" } }}

This helps producing more useful conflict reports (still rare edge cases possible when reporting may be confusing, i.e. when node is deleted)

Caveat: dirty checking

Some plugins will delete nodes and then re-create on any change. So even if final metadata is identical it will be still marked as dirty. We additionally compare dirty metadata between calls (and skip rebuilding if the inferred structure is the same)

Caveat: rebuild granularity

Currently, we rebuild full schema instance from scratch (only when there are some structural changes). Granular updates are complicated as graphql-compose wasn't designed for mutations like this: it is great for complex schema builds but provides little help when you delete or modify types / fields / arguments.

Possible follow-ups

  • Rebuild granularity (see caveat above). This will likely require some coordination with graphql-compose author or our own type/field/arg dependency tracking. But the complexity involved might not worth the effort.

  • We could use metadata directly for inference (vs. example value). Metadata has more information for inference or granular rebuilds. For example, we could handle data conflicts more gently. Say when < 1% of nodes have a conflicting field type - add the majority field and warn about specific conflicting node id (currently we remove a field on conflicts and report both nodes)

  • Investigate webpack invalidation on schema change, see feat(gatsby): Schema rebuilding #19092 (review)

Related Issues

Fixes #18939

Copy link
Contributor

@freiksenet freiksenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so cool! 👍

I think you are missing child/parent relationship cases, so when child/parent relationship is added either through action or by setting 'parent' field.

We usually test against .org (www folder in monorepo) using gatsby-dev-cli. You can test against multiple sites using develop-runner. You need to modify source code a bit, but it's still very handy.

We often publish a pre-release version for packages with big changes, then you can use it to test with develop-runner. Eg you can reuse tag @schema-customization. (To do it, go to gatsby/packages/gatsby and do yarn publish --tag schema-customization).

@@ -0,0 +1,346 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

const report = require(`gatsby-cli/lib/reporter`)

// API_RUNNING_QUEUE_EMPTY could be emitted multiple types
// in a short period of time, so debounce seems reasonable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true and probably something we should just look into to debounce API_RUNNING_QUEUE_EMPTY event elsewhere (most systems that listen to that are also potentially expensive operations, so having global debounce would help)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, missed this comment, sorry. It makes sense. But I suggest doing this in a separate PR

// bar: {
// string: { total: 1, example: 'str' },
// },
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are nested objects represented here? How arrays are represented?

I.e.
what would

const node2 = { id: '1', nested: { foo: 'bar' }, array: [1, 2, 3, "string"] }

produce

(if this is handled already)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah there are types below - ignore above ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those are just simple usage examples, didn't want to distract with all the recursion stuff here %)

@vladar vladar force-pushed the vladar-rebuild-schema branch from 0149d75 to 0607c9e Compare November 12, 2019 13:51
@vladar vladar changed the title WIP: schema rebuilding feat(gatsby): Schema rebuilding Nov 13, 2019
@vladar vladar marked this pull request as ready for review November 13, 2019 07:43
@vladar vladar requested review from a team as code owners November 13, 2019 07:43
Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at the webpack plugin and it looks great! 👏 (well done!)
if we want to invalidate the webpack config when the schema changes we need to do some more hackery 😂

app.use(
require(`webpack-dev-middleware`)(compiler, {
logLevel: `silent`,
publicPath: devConfig.output.publicPath,
stats: `errors-only`,
})
)

The webpack-dev-middleware returns an instance that has an invalidate method. This method tells webpack to invalidate itself which reruns eslint on all watched files.

I'm thinking we should do something like

  const webpackDevMiddleware = require(`webpack-dev-middleware`)(compiler, {
    logLevel: `silent`,
    publicPath: devConfig.output.publicPath,
    stats: `errors-only`,
  });

  app.use(webpackDevMiddleware)
  
  // this should only be triggered when it actually has changed.
  emitter.on('SCHEMA_REBUILD', () => {
    webpackDevMiddleware.invalidate()
  })

packages/gatsby/src/utils/webpack-utils.js Outdated Show resolved Hide resolved
packages/gatsby/src/utils/webpack-utils.js Outdated Show resolved Hide resolved
freiksenet
freiksenet previously approved these changes Nov 13, 2019
Copy link
Contributor

@freiksenet freiksenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@vladar
Copy link
Contributor Author

vladar commented Nov 14, 2019

@wardpeet I tried doing webpack invalidation but for some reason, eslint-loader is not running after it. For now, we re-run queries here:

await rebuild({ parentSpan: activity })
await updateStateAndRunQueries(false, { parentSpan: activity })

which will fail with error anyway. I will add invalidation to possible follow-ups and will probably need your help to debug why it is not working as expected.

@wardpeet
Copy link
Contributor

Sweet! Sounds great, I'm not 100% sure my pseudo-code was the right call. Happy to debug later :)

@vladar vladar added the bot: merge on green Gatsbot will merge these PRs automatically when all tests passes label Nov 19, 2019
@gatsbybot gatsbybot merged commit e4dae4d into master Nov 19, 2019
@delete-merged-branch delete-merged-branch bot deleted the vladar-rebuild-schema branch November 19, 2019 08:12
@vladar
Copy link
Contributor Author

vladar commented Nov 19, 2019

Published in gatsby 2.18.0

@aileen
Copy link
Contributor

aileen commented Nov 25, 2019

Hey @vladar 👋🏼
I have a question re this task:

Test with multiple medium to big size projects

Has this been done actually? I'm asking because the jump to Gatsby@2.18.0 significantly slowed down the development process for us:

With Gastby@2.18.0 after changing some markup in a page, createPages takes almost 30s:

success Building development bundle - 22.864s
info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.007s
success createPages - 28.390s
success run queries - 0.104s - 0/1 9.58/s
success extract queries from components - 2.147s
success write out requires - 0.007s
success Re-building development bundle - 3.067s
success run queries - 4.639s - 385/385 82.99/s

With Gatsby@2.17.17 after changing some markup in a page, createPages takes 4s:

success Building development bundle - 21.763s
info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.006s
success createPages - 3.962s
success run queries - 0.105s - 0/1 9.54/s
success extract queries from components - 2.090s
success write out requires - 0.004s
success Re-building development bundle - 3.149s
success run queries - 4.879s - 385/385 78.90/s
success run queries - 0.078s - 10/10 127.74/s

Is this known? Do you want me to open an issue for that? For now, we'll have to stick using the pre 2.18 version, as developing is kinda unbearable otherwise.

@vladar
Copy link
Contributor Author

vladar commented Nov 25, 2019

There are two known performance regressions. One in 2.18.0 after schema rebuilding and another one in 2.18.1 after #17681

I guess we need to figure out which one affects you the most. Could you maybe try the exact version 2.18.0 and 2.18.1 and compare results?

@aileen
Copy link
Contributor

aileen commented Nov 26, 2019

Oh!! Good call! Sorry, for not being accurate enough 😬

Seems like the longer rebuilding time is actually caused by 2.18.1 👇🏼

With Gatsby@2.18.0 (~3.2s):

info added file at /Users/aileen/code/G3/src/pages/members.js
success write out requires - 0.003s
success createPages - 3.158s
success run queries - 0.103s - 1/1 9.69/s
success extract queries from components - 1.753s
success write out requires - 0.006s
success Re-building development bundle - 2.537s
success run queries - 4.605s - 385/385 83.60/s
success run queries - 0.060s - 10/10 165.95/s

With Gatsby@2.18.1 (~21.8s):

info added file at /Users/aileen/code/G3/src/pages/members.js
success extract queries from components - 0.201s
success write out requires - 0.006s
success createPages - 21.790s
success run queries - 0.118s - 1/1 8.46/s
success write out requires - 0.004s
success Re-building development bundle - 22.950s
success run queries - 4.647s - 385/385 82.84/s

@vladar
Copy link
Contributor Author

vladar commented Nov 27, 2019

@AileenCGN Potential fix for this regression is published in gatsby 2.18.4 (see PR #19774). Could you try again and maybe post here if it improves things for you?

@aileen
Copy link
Contributor

aileen commented Nov 28, 2019

@vladar Awesome!! It's fixed now 🎉 Back to old rebuild times now 🤗

@tu4mo
Copy link

tu4mo commented Dec 11, 2019

Hey @vladar, this seems to have somehow broken gatsby-transform-react-docgen, see #20043.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot: merge on green Gatsbot will merge these PRs automatically when all tests passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GraphQL Schema rebuilding
9 participants