-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(gatsby): fix performance regression with query dependency cleaning #28032
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great :D Thanks!
const nodeQueries = state.byNode.get(nodeId) | ||
if (nodeQueries) { | ||
nodeQueries.delete(queryId) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const nodeQueries = state.byNode.get(nodeId) | |
if (nodeQueries) { | |
nodeQueries.delete(queryId) | |
} | |
state.byNode.get(nodeId)?.delete(queryId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my first attempt. But it gives an Eslint error.
Some Context First
A recent refactor of dirty query implementation #27504 caused a performance regression in the query running step. The old implementation before that PR could wipe all data dependencies for queries at once after all of them were executed.
With queries on demand, we will run queries individually and so can't remove data dependencies in a single batch - we have to wipe them for individual queries.
It requires slightly different data structures to make it fast.
Description
Before this PR we were tracking data dependencies per node/connection:
This is enough to efficiently mark all queries related to a given node as dirty. But if you want to remove a single query from node dependencies you have to loop through all nodes in this map.
This PR introduces an inverse map for tracked data dependencies to solve this.
Now we can first get IDs of all nodes associated with a query. And then remove query <-> node dependency for those individual nodes (so no need to loop through all nodes).
Note: we do not use an inverse map for connections tracking because there is a limited number of connections and in my tests, the overhead associated with inverse maps for connections makes performance worse. We can revisit this later if we find it necessary.
Results
32k pages on my laptop before/after: