-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(gatsby): Create index on the fly for non-id index #20729
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we deal with the mappedByKey cache? See TODO desc inline for details.
Added inline comment
Is the __gatsby_resolved bit properly handled?
Can we skip the __gatsby_resolved bit? My benchmarks complete without error if I omit them, but I'm worried that it just means somehting broke but didn't lead to a syntax error.
See inline comment for both. We can't skip it, but it needs to be moved.
If the cache is missed, can we just return undefined instead of going through sift anyways? I think it will result in the same result.
As we always ensure cache, I think yes.
I've moved the ./nodes requires to the top. Was there a particular reason they were inline to the function?
There used to be a circular dependency. Are tests passing?
Verify the anomaly where the bench-md-id benchmark (186 before/after) now completes slower than the bench-md-slug benchmark (208->178)
Could it just be log growth of map lookup? You do by-id lookup on map of all nodes vs by-index lookup of a much smaller subset.
At first I thought so as well but then I realized that slugs have it worse; they are unique (supposedly) but are also adding an extra Set object for each position in the Map. There is a catch here, though; we generate way more nodes than we have pages. So perhaps that's a reason. I will make a note to look into doing the |
97f0a80
to
a4ab520
Compare
Tests are passing and I quickly checked but do not spot a circle now. I guess it's resolved so let's keep it at the top. |
It was. See the updated code and how |
a4ab520
to
c04d5bf
Compare
9727b7f
to
1977ecf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite brilliant! 👍 Some minor comments.
There are currently no cache tests, the problem is that it also deconstructs cache at wrong level. Ideally graphql-runner should take care of monitoring changes to schema and nodes, however for some reason I implemented it above graphql-runner.
I'd say we can shortcut a bit more. Eg should we rerun with sift if we return undefined?
__gatsby_resolve is guaranteed to be available once you run prepareNodes in node-model. Thus it should always be there if it is needed. If the test that proxies "slug" to "originalSlug" works correctly then it's all done correctly. |
6d9cd88
to
1d1bb2e
Compare
Apart from the one TODO about the nodes loop in ensureIndexByTypedChain, this is nearing completion. Comments are welcome. I'll remove the draft tags once I'm done with profiling. |
60793c7
to
8dd7989
Compare
668bb23
to
95444b2
Compare
95444b2
to
cb83b92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Let's wait for @vladar 's review and ship it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work! Looks good to me. Left one small nit that shouldn't block this PR in any way.
Pushed a fix by @freiksenet to make sure the loki path also works (tests were failing because of that, thankfully). So now |
* cached instead of a Set of Nodes. | ||
*/ | ||
replaceTypeKeyValueCache(map = new Map()) { | ||
this._typedKeyValueIndexes = new Map() // See redux/nodes.js for usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a mistake not using the argument to assign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ... Eh ... yes? :/
As suggested by @blainekasten in #20729 (review)
As suggested by @blainekasten in #20729 (review)
This takes a more generic approach to shortcircuiting
eq
filters one exactly one property, including a chain of properties.To be clear; this can make a HUGE perf difference at scale. It is a followup of #20609 which only applied to index by
id
and dropped a site runtime from 4.5h to 5 minutes.This PR seems to improve perf the same for that site, but now for
slug
or any other property!This means a filter like
allISomeSource(filter: { fields: { slug: { eq: $slug } } }) {
can now be as fast as usingid
, which before was the only thing being indexed.This PR is not finished (hence in a draft state). The graphql guru's need to take a closer look at this and then there are some (for me) obvious points that need to be addressed;
mappedByKey
cache?__gatsby_resolved
bit properly handled?__gatsby_resolved
bit? My benchmarks complete without error if I omit them, but I'm worried that it just means somehting broke but didn't lead to a syntax error.__gatsby_resolved
property is not set "too early". Or is that state basically invariant after the bootstrap? (Before it would be set on each call of the filter, now we're just setting it once when creating the cache, technically there's time where the state could change, but I don't think that's the case here)undefined
instead of going through sift anyways? I think it will result in the same result.that seems to be correctslug: String @proxy(from: "slugInternal")
and we don't know about that so a miss should still go through sift just in case.filter
is empty. Other parts are already kind of doing this, but after the heavy step../nodes
requires to the top. Was there a particular reason they were inline to the function?filter
is a singler property path toeq
? The current (initial) appraoch seems to perform fine, but I wouldn't be surprised if there were faster ways.id
the similar treatment (creating an index based on types+id) a smaller map is created and the lookup is faster. One shortcut we can still apply forid
is not creating theSet
s in the index, since each bucket can only have one node (sinceid
must be unique). This now drops the bench-md-id benchmark from 186 to 173. WoohooelemMatch
elemMatch
(--> @freiksenet will add a test)ensureIndexByTypedChain
. Let's say there's three types and they have an equal share of nodes then it might only loop a third of the total number of nodes. And these sets are already present anyways. But it might not really make a difference so I need to check this.While an improvement for many, this adds a little overhead to filters that are not flat (one leaf), filters that use
elemMatch
, and even more regression for schemas with@proxy
where the filter misses (since that still first goes through the new index). But probably within acceptable bounds.