Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(gatsby): fix performance regression with query dependency cleaning #28032

Merged
merged 3 commits into from
Nov 16, 2020

Conversation

vladar
Copy link
Contributor

@vladar vladar commented Nov 13, 2020

Some Context First

A recent refactor of dirty query implementation #27504 caused a performance regression in the query running step. The old implementation before that PR could wipe all data dependencies for queries at once after all of them were executed.

With queries on demand, we will run queries individually and so can't remove data dependencies in a single batch - we have to wipe them for individual queries.

It requires slightly different data structures to make it fast.

Description

Before this PR we were tracking data dependencies per node/connection:

const queriesByNode. = new Map<string, Set<string>>()
const queryIds: Set<string> = queriesByNode.get(nodeId)

This is enough to efficiently mark all queries related to a given node as dirty. But if you want to remove a single query from node dependencies you have to loop through all nodes in this map.

This PR introduces an inverse map for tracked data dependencies to solve this.

const queriesByNode = new Map<string, Set<string>>()
const queryNodes = new Map<string, Set<string>>()
const queries = queriesByNode.get(nodeId)
const nodes = queryNodes.get(queryId)

Now we can first get IDs of all nodes associated with a query. And then remove query <-> node dependency for those individual nodes (so no need to loop through all nodes).

Note: we do not use an inverse map for connections tracking because there is a limited number of connections and in my tests, the overhead associated with inverse maps for connections makes performance worse. We can revisit this later if we find it necessary.

Results

32k pages on my laptop before/after:

success run page queries - 73.472s - 31997/31997 435.50/s
success run page queries - 52.695s - 31994/31994 607.16/s

@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Nov 13, 2020
@vladar vladar requested a review from pvdz November 13, 2020 12:42
@vladar vladar removed the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Nov 13, 2020
pvdz
pvdz previously approved these changes Nov 16, 2020
Copy link
Contributor

@pvdz pvdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great :D Thanks!

packages/gatsby/src/redux/reducers/queries.ts Outdated Show resolved Hide resolved
Comment on lines +246 to +249
const nodeQueries = state.byNode.get(nodeId)
if (nodeQueries) {
nodeQueries.delete(queryId)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const nodeQueries = state.byNode.get(nodeId)
if (nodeQueries) {
nodeQueries.delete(queryId)
}
state.byNode.get(nodeId)?.delete(queryId)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my first attempt. But it gives an Eslint error.

packages/gatsby/src/redux/reducers/queries.ts Outdated Show resolved Hide resolved
@pvdz pvdz added topic: performance Related to runtime & build performance topic: scaling builds labels Nov 16, 2020
@vladar vladar added the bot: merge on green Gatsbot will merge these PRs automatically when all tests passes label Nov 16, 2020
@gatsbybot gatsbybot merged commit de5517b into master Nov 16, 2020
@delete-merged-branch delete-merged-branch bot deleted the vladar/query-perf-fix branch November 16, 2020 10:13
vladar added a commit that referenced this pull request Nov 16, 2020
…ng (#28032)

* perf(gatsby): fix performance regression with query dependency cleaning

* update snapshot

* More consistent codestyle

(cherry picked from commit de5517b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot: merge on green Gatsbot will merge these PRs automatically when all tests passes topic: performance Related to runtime & build performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants