perf(gatsby): Create index on the fly for non-id index #20729

pvdz · 2020-01-20T21:25:31Z

This takes a more generic approach to shortcircuiting eq filters one exactly one property, including a chain of properties.

To be clear; this can make a HUGE perf difference at scale. It is a followup of #20609 which only applied to index by id and dropped a site runtime from 4.5h to 5 minutes.

This PR seems to improve perf the same for that site, but now for slug or any other property!

This means a filter like allISomeSource(filter: { fields: { slug: { eq: $slug } } }) { can now be as fast as using id, which before was the only thing being indexed.

This PR is not finished (hence in a draft state). The graphql guru's need to take a closer look at this and then there are some (for me) obvious points that need to be addressed;

While an improvement for many, this adds a little overhead to filters that are not flat (one leaf), filters that use elemMatch, and even more regression for schemas with @proxy where the filter misses (since that still first goes through the new index). But probably within acceptable bounds.

freiksenet

How should we deal with the mappedByKey cache? See TODO desc inline for details.

Added inline comment

Is the __gatsby_resolved bit properly handled?
Can we skip the __gatsby_resolved bit? My benchmarks complete without error if I omit them, but I'm worried that it just means somehting broke but didn't lead to a syntax error.

See inline comment for both. We can't skip it, but it needs to be moved.

If the cache is missed, can we just return undefined instead of going through sift anyways? I think it will result in the same result.

As we always ensure cache, I think yes.

I've moved the ./nodes requires to the top. Was there a particular reason they were inline to the function?

There used to be a circular dependency. Are tests passing?

Verify the anomaly where the bench-md-id benchmark (186 before/after) now completes slower than the bench-md-slug benchmark (208->178)

Could it just be log growth of map lookup? You do by-id lookup on map of all nodes vs by-index lookup of a much smaller subset.

packages/gatsby/src/redux/nodes.js

packages/gatsby/src/redux/run-sift.js

packages/gatsby/src/redux/nodes.js

pvdz · 2020-01-21T09:00:47Z

Could it just be log growth of map lookup?

At first I thought so as well but then I realized that slugs have it worse; they are unique (supposedly) but are also adding an extra Set object for each position in the Map. There is a catch here, though; we generate way more nodes than we have pages. So perhaps that's a reason. I will make a note to look into doing the getPage lookup through type specific maps instead, see if that makes a difference at scale.

pvdz · 2020-01-21T09:43:30Z

There used to be a circular dependency. Are tests passing?

Tests are passing and I quickly checked but do not spot a circle now. I guess it's resolved so let's keep it at the top.

packages/gatsby/src/redux/run-sift.js

pvdz · 2020-01-21T12:33:39Z

Could it just be log growth of map lookup?

It was. See the updated code and how id gets special cased now. An indexes is still created for it (just sans the Sets) leading to smaller Maps leading to faster lookups.

packages/gatsby/src/redux/nodes.js

freiksenet

This is quite brilliant! 👍 Some minor comments.

packages/gatsby/src/redux/run-sift.js

packages/gatsby/src/schema/node-model.js

packages/gatsby/src/schema/__tests__/node-model.js

packages/gatsby/src/schema/node-model.js

freiksenet · 2020-01-23T08:28:29Z

How can I verify (where are these tests?) that this cache is properly torn down when any node gets changes/added/deleted? I don't see existing tests and I'm not sure how to run this e2e such to test this. I would want to start with some nodes, run a filter, verify the cache was updated, and then for each of the three mutations apply the mutation and verify the cache was reset.

There are currently no cache tests, the problem is that it also deconstructs cache at wrong level. Ideally graphql-runner should take care of monitoring changes to schema and nodes, however for some reason I implemented it above graphql-runner.

Am I shortcutting too much? Too little? Since I'm not running sift, I'm a liiiitle worried that I'm missing something to the filtering.

I'd say we can shortcut a bit more. Eg should we rerun with sift if we return undefined?

Confirm the __gatsby_resolved property is not set "too early". Or is that state basically invariant after the bootstrap? (Before it would be set on each call of the filter, now we're just setting it once when creating the cache, technically there's time where the state could change, but I don't think that's the case here)

__gatsby_resolve is guaranteed to be available once you run prepareNodes in node-model. Thus it should always be there if it is needed. If the test that proxies "slug" to "originalSlug" works correctly then it's all done correctly.

pvdz · 2020-01-23T16:53:12Z

Apart from the one TODO about the nodes loop in ensureIndexByTypedChain, this is nearing completion. Comments are welcome. I'll remove the draft tags once I'm done with profiling.

packages/gatsby/src/schema/__tests__/node-model.js

packages/gatsby/src/redux/run-sift.js

packages/gatsby/src/redux/nodes.js

Make. It. Fast. Er

freiksenet

Looking good! Let's wait for @vladar 's review and ship it.

packages/gatsby/src/redux/nodes.js

vladar

Fantastic work! Looks good to me. Left one small nit that shouldn't block this PR in any way.

packages/gatsby/src/redux/run-sift.js

pvdz · 2020-01-28T11:57:14Z

Pushed a fix by @freiksenet to make sure the loki path also works (tests were failing because of that, thankfully). So now GATSBY_DB_NODES=loki yarn test also passes.

packages/gatsby/src/redux/__tests__/run-sift.js

blainekasten · 2020-01-30T16:12:22Z

packages/gatsby/src/schema/node-model.js

+   *   cached instead of a Set of Nodes.
+   */
+  replaceTypeKeyValueCache(map = new Map()) {
+    this._typedKeyValueIndexes = new Map() // See redux/nodes.js for usage


is this a mistake not using the argument to assign?

The ... Eh ... yes? :/

@blainekasten

As suggested by @blainekasten in #20729 (review)

@blainekasten

As suggested by @blainekasten in #20729 (review)

@blainekasten

As suggested by @blainekasten in gatsbyjs/gatsby#20729 (review)

@blainekasten

As suggested by @blainekasten in gatsbyjs/gatsby#20729 (review)

pvdz added the status: WIP label Jan 20, 2020

freiksenet reviewed Jan 21, 2020

View reviewed changes

packages/gatsby/src/redux/nodes.js Outdated Show resolved Hide resolved

packages/gatsby/src/redux/nodes.js Show resolved Hide resolved

packages/gatsby/src/redux/run-sift.js Outdated Show resolved Hide resolved

packages/gatsby/src/redux/nodes.js Outdated Show resolved Hide resolved

pvdz force-pushed the generic-anti-sift branch from 97f0a80 to a4ab520 Compare January 21, 2020 09:31

pvdz commented Jan 21, 2020

View reviewed changes

packages/gatsby/src/redux/run-sift.js Outdated Show resolved Hide resolved

pvdz force-pushed the generic-anti-sift branch from a4ab520 to c04d5bf Compare January 21, 2020 12:35

pvdz commented Jan 21, 2020

View reviewed changes

packages/gatsby/src/redux/nodes.js Outdated Show resolved Hide resolved

pvdz force-pushed the generic-anti-sift branch 6 times, most recently from 9727b7f to 1977ecf Compare January 22, 2020 19:22

freiksenet suggested changes Jan 23, 2020

View reviewed changes

ascorbic mentioned this pull request Jan 23, 2020

Explore page build optimisations #20785

Closed

pvdz force-pushed the generic-anti-sift branch 5 times, most recently from 6d9cd88 to 1d1bb2e Compare January 23, 2020 16:49

freiksenet reviewed Jan 24, 2020

View reviewed changes

packages/gatsby/src/schema/__tests__/node-model.js Outdated Show resolved Hide resolved

pvdz force-pushed the generic-anti-sift branch 2 times, most recently from 60793c7 to 8dd7989 Compare January 24, 2020 09:30

pvdz removed the status: WIP label Jan 27, 2020

pvdz marked this pull request as ready for review January 27, 2020 11:50

pvdz requested a review from a team as a code owner January 27, 2020 11:50

pvdz force-pushed the generic-anti-sift branch from 668bb23 to 95444b2 Compare January 27, 2020 15:12

freiksenet requested a review from vladar January 27, 2020 15:41

muescha reviewed Jan 27, 2020

View reviewed changes

packages/gatsby/src/redux/run-sift.js Outdated Show resolved Hide resolved

muescha reviewed Jan 27, 2020

View reviewed changes

packages/gatsby/src/redux/nodes.js Show resolved Hide resolved

pvdz added 3 commits January 28, 2020 09:03

perf(gatsby): Create index on the fly for non-id index

fe499bf

Make. It. Fast. Er

elemMatch test by @freiksenet

9315c1f

Drop unused arg

cb83b92

pvdz force-pushed the generic-anti-sift branch from 95444b2 to cb83b92 Compare January 28, 2020 08:10

freiksenet previously approved these changes Jan 28, 2020

View reviewed changes

packages/gatsby/src/redux/nodes.js Show resolved Hide resolved

vladar previously approved these changes Jan 28, 2020

View reviewed changes

packages/gatsby/src/redux/run-sift.js Show resolved Hide resolved

Fix api for loki (by @freiksenet)

658d07e

pvdz dismissed stale reviews from vladar and freiksenet via 658d07e January 28, 2020 11:56

freiksenet approved these changes Jan 28, 2020

View reviewed changes

pvdz merged commit 115d5c4 into master Jan 28, 2020

delete-merged-branch bot deleted the generic-anti-sift branch January 28, 2020 13:20

muescha reviewed Jan 28, 2020

View reviewed changes

packages/gatsby/src/redux/__tests__/run-sift.js Show resolved Hide resolved

pvdz mentioned this pull request Jan 28, 2020

[Request] Real-world Gatsby sites (50k+ pages) #19512

Closed

tmilewski mentioned this pull request Jan 29, 2020

Gatsby Image - Unknown fragment ... #20984

Closed

blainekasten reviewed Jan 30, 2020

View reviewed changes

pvdz added a commit that referenced this pull request Feb 11, 2020

chore(gatsby): Use argument for updating cache

5f41562

As suggested by @blainekasten in #20729 (review)

pvdz mentioned this pull request Feb 11, 2020

fix(gatsby): Use argument for updating cache #21365

Merged

gatsbybot pushed a commit that referenced this pull request Feb 11, 2020

fix(gatsby): Use argument for updating cache (#21365)

fb36368

As suggested by @blainekasten in #20729 (review)

gatsbybot added a commit to gatsbyjs/gatsby-starter-blog that referenced this pull request Feb 11, 2020

fix(gatsby): Use argument for updating cache (#21365)

e7fa3f1

As suggested by @blainekasten in gatsbyjs/gatsby#20729 (review)

leonhiat added a commit to leonhiat/gatsby-starter-blog that referenced this pull request Oct 31, 2023

fix(gatsby): Use argument for updating cache (#21365)

1f46a9d

As suggested by @blainekasten in gatsbyjs/gatsby#20729 (review)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(gatsby): Create index on the fly for non-id index #20729

perf(gatsby): Create index on the fly for non-id index #20729

pvdz commented Jan 20, 2020 •

edited

Loading

freiksenet left a comment

pvdz commented Jan 21, 2020

pvdz commented Jan 21, 2020

pvdz commented Jan 21, 2020

freiksenet left a comment

freiksenet commented Jan 23, 2020

pvdz commented Jan 23, 2020

freiksenet left a comment

vladar left a comment

pvdz commented Jan 28, 2020

blainekasten Jan 30, 2020

pvdz Feb 11, 2020

perf(gatsby): Create index on the fly for non-id index #20729

perf(gatsby): Create index on the fly for non-id index #20729

Conversation

pvdz commented Jan 20, 2020 • edited Loading

freiksenet left a comment

Choose a reason for hiding this comment

pvdz commented Jan 21, 2020

pvdz commented Jan 21, 2020

pvdz commented Jan 21, 2020

freiksenet left a comment

Choose a reason for hiding this comment

freiksenet commented Jan 23, 2020

pvdz commented Jan 23, 2020

freiksenet left a comment

Choose a reason for hiding this comment

vladar left a comment

Choose a reason for hiding this comment

pvdz commented Jan 28, 2020

blainekasten Jan 30, 2020

Choose a reason for hiding this comment

pvdz Feb 11, 2020

Choose a reason for hiding this comment

pvdz commented Jan 20, 2020 •

edited

Loading