forked from gatsbyjs/gatsby
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page build optimisations for incremental data changes #3
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dominicfallows
changed the title
Improve page build on data change
Page build optimisations for incremental data changes
Feb 7, 2020
…ctive-investor/gatsby into improve-page-build-on-data-change
This reverts commit a9b2b68.
…s.md Co-Authored-By: LB <laurie@gatsbyjs.com>
…ctive-investor/gatsby into improve-page-build-on-data-change
…s.md Co-Authored-By: LB <laurie@gatsbyjs.com>
…s.md Co-Authored-By: LB <laurie@gatsbyjs.com>
…s.md Co-Authored-By: LB <laurie@gatsbyjs.com>
…s.md Co-Authored-By: Michal Piechowiak <misiek.piechowiak@gmail.com>
Co-Authored-By: Michal Piechowiak <misiek.piechowiak@gmail.com>
…ctive-investor/gatsby into improve-page-build-on-data-change
Co-Authored-By: LB <laurie@gatsbyjs.com>
Closing, as this PR was a draft mirror for the actual Gatsby repo |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Gatsby sources data from multiple sources (CMS, static files - like Markdown, databases, APIs, etc) and creates an aggregated dataset in GraphQL. Currently, each
gatsby build
uses the GraphQL dataset and queries to do a complete rebuild of the whole app - ready for deployment - including static assets like HTML, JavaScript, JSON, media files, etc.Projects that have a small (10s to 100s) to medium (100s to 1000s) amount of content, deploying these sites don't present a problem.
Building sites with large amounts of content (10,000s upwards) are already relatively fast with Gatsby. However, some projects might start to experience issues when adopting CI/CD principles - continuously building and deploying. Gatsby rebuilds the complete app which means the complete app also needs to be deployed. Doing this each time a small data change occurs unnecessarily increases demand on CPU, memory, and bandwidth.
One solution to these problems might be to use Gatsby Cloud's Build features.
For projects that require self-hosted environments, where Gatsby Cloud would not be an option, being able to only deploy the content that has changed or is new (incremental data changes, you might say) would help reduce build times, deployment times and demand on resources.
This PR is to introduce an experimental enhancement to only build pages with data changes.
How to use
To enable this enhancement, use the environment variable
GATSBY_PAGE_BUILD_ON_DATA_CHANGES=true
in yourgatsby build
command, for example:GATSBY_PAGE_BUILD_ON_DATA_CHANGES=true node ./node_modules/.bin/gatsby build
This will run the Gatsby build process, but only build pages that have data changes since your last build. If there are any changes to code (JS, CSS) the bundling process returns a new webpack compilation hash which causes all pages to be rebuilt.
Reporting what has been built
You might need to get a list of the pages that have been built for example, if you want to perform a sync action in your CI/CD pipeline.
To list the paths in the build assets (
public
) folder, you can use one (or both) of the following arguments in yourbuild
command.--log-pages
outputs the updated paths to the console at the end of the build--write-to-file
creates two files in the.cache
folder, with lists of the changes paths in the build assets (public
) folder.newPages.txt
will contain a list of paths that have changed or are newdeletedPages.txt
will contain a list of paths that have been deletedIf there are no changed or deleted paths, then the relevant files will not be created in the
.cache
folder.Approach
An enhancement works by comparing the previous page data from cache (returned by
readState()
) to the newly created page data in GraphQL, that can be accessed bystore.getState()
. By comparing these two data sets, we can determine which pages have been updated, newly created or deleted.There are two new functions
getChangedPageDataKeys
andremovePreviousPageData
inutils/page-data.js
:getChangedPageDataKeys
loops through each page's "content" this includes the data and context, comparing it to the previous content. If there is a difference, or the key does not exist (new page), this key is added to this functions returned array.removePreviousPageData
loops through each key, if the key is not present in the new data, the page will be removed and a key added to this functions returned array.This array of path keys used as the
pagePaths
value for thebuildHTML.buildPages
process.At the end of the build process, the
removePreviousPageData
function uses each deleted page key to remove a matching directory from the public folder. This is instead of deleting all HTML from the public directory at the beginning of the build process.Performance improvement
We have run various performance tests on our projects. For context, we use AWS CodePipeline to build and deploy our Gatsby projects, one of which is approaching 30k pages.
On our ~30k page project, when we run a full build versus a content change build, we are seeing vastly improved deploy times, alongside reduced CPU and memory spikes.
For example, for a full build and deploy, we see an average of 10-11 minutes. For a content change build, this is reduced down to an average 5-6 minutes 🚀
Further considerations
To enable this build option you will need to set an environment variable, so you will need access to set variables in your build environment.
You will need to persist the
.cache/redux.state
between builds, allowing for comparison, if there is noredux.state
file located in the/.cache
the folder then a full build will be triggered.Any code or static query changes (templates, components, source handling, new plugins etc) creates a new webpack compilation hash and triggers a full build.
Related Issues
Related to PR #20785
Related to Issue #5002