Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore page build optimisations #20785

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
4c92e78
TECS-84 Spike initial work into incremental builds gatsby
StuartRayson Sep 27, 2019
01e2907
TECS-84 add Page to state
StuartRayson Sep 30, 2019
f2cafef
TECS-84 Fix syntax error
StuartRayson Sep 30, 2019
9cbfaf9
TECS-84 add workpool
StuartRayson Sep 30, 2019
e88ca4d
TECS-84 fix hashing issues
StuartRayson Oct 4, 2019
cfe547f
TECS-84 missing fs require
StuartRayson Oct 8, 2019
020c46d
refactor
StuartRayson Oct 8, 2019
9b88bc5
TECS-84 update redux.state to be json file
StuartRayson Oct 8, 2019
89e64d3
remove pages
StuartRayson Oct 9, 2019
376a6e6
TECS-84 remove old page data
StuartRayson Oct 9, 2019
180d571
TECS-84 clean up terminal output
StuartRayson Oct 9, 2019
1b6cab8
TECS-84 code clean up
StuartRayson Oct 9, 2019
2e20c66
TECS-84 refactor
StuartRayson Oct 9, 2019
0b3a5cd
TECS-84 refactor code and added content hashes
StuartRayson Oct 10, 2019
66f80c8
TECS-84 refactoring
StuartRayson Oct 11, 2019
827c922
TECS-84 refactor
StuartRayson Oct 11, 2019
a64ce4d
TECS-84 remove copy logic
StuartRayson Oct 11, 2019
0cec9fa
TECS-84 code clean up
StuartRayson Oct 14, 2019
ce6058c
TECS-84 code clean up
StuartRayson Oct 14, 2019
451f5a7
TECS-84 code refactor
StuartRayson Oct 14, 2019
9683c3c
TECS-84 remove unused pageDataUtil import
StuartRayson Oct 14, 2019
930bb8e
TECS-84 Fix page state being pre populated
StuartRayson Oct 14, 2019
81b98d3
TECS-84 added comments
StuartRayson Oct 14, 2019
c7288a7
TECS-84 code clean up
StuartRayson Oct 14, 2019
5ccf46f
TECS-84 add env flags
StuartRayson Oct 14, 2019
6ff78a8
TECS-84 keep redux state if incremental build
StuartRayson Oct 14, 2019
3d17ade
TECS-84 updated comments
StuartRayson Oct 14, 2019
104ff39
TECS-84 spelling
StuartRayson Oct 14, 2019
02305b2
TECS-84 comment
StuartRayson Oct 14, 2019
a4d4959
TECS-84 spelling
StuartRayson Oct 14, 2019
ea2310f
TECS-84 PR updates
StuartRayson Oct 15, 2019
a26b81c
TECS-84 Use map not objects
StuartRayson Oct 15, 2019
f0d1525
TECS-84 removed white space
StuartRayson Oct 15, 2019
1b04707
Update spelling
StuartRayson Oct 15, 2019
29d0532
TECS-84 refactor
StuartRayson Oct 16, 2019
16d091d
TECS-84 add delete file
StuartRayson Oct 16, 2019
676a6c0
Merge branch 'TECS-84-spike-incremental-builds-gatsby' of github.com:…
StuartRayson Oct 16, 2019
8e9ddd4
TECS-84 Update snapshots
StuartRayson Oct 16, 2019
c9867dd
TECS-84 Refeactor
StuartRayson Oct 17, 2019
09b914b
TECS-84 Refeactor
StuartRayson Oct 17, 2019
961bf13
TECS-84-spike-incremental-builds-gatsby
StuartRayson Oct 17, 2019
fd61506
TECS-84 refactor
StuartRayson Oct 17, 2019
689dafc
TECS-84 omit page data from cache test
StuartRayson Oct 17, 2019
39b4034
TECS-84 refactor
StuartRayson Oct 17, 2019
2bf29c7
move back waitJobsFinished
StuartRayson Oct 17, 2019
79e6cb7
TECS-84 Use read state instead of read cache
StuartRayson Oct 25, 2019
b6aae12
TECS-84 oput directories as pipe sep string
StuartRayson Oct 29, 2019
eb1c205
update incremental logic to 2.16
StuartRayson Nov 19, 2019
0eccca0
fix merge conflicts
StuartRayson Nov 20, 2019
af6566d
Merge branch 'master' of github.com:interactive-investor/gatsby into …
StuartRayson Jan 7, 2020
6464e69
Add --write-to-file logic
StuartRayson Jan 16, 2020
8318020
reslove conflict from version 2.18.23
StuartRayson Jan 16, 2020
d508bbc
refactor getNewPageKeys
StuartRayson Jan 16, 2020
09cf83f
📝 - TECS-84: Documentation for incremental builds
ConorLindsay94 Jan 16, 2020
5dd670d
📝 fix doc formating issues
StuartRayson Jan 16, 2020
d248c32
📝 Fix uppercase on titile
StuartRayson Jan 16, 2020
df512aa
🐛 fix hash rewrite on incremental build
StuartRayson Jan 20, 2020
89e0a38
📝 doc update
StuartRayson Jan 20, 2020
9702159
🐛 Add line ending when creating incremental files
StuartRayson Jan 21, 2020
156e3a5
Update solution to support 2.19.1
StuartRayson Jan 22, 2020
d0804a5
Update incremental-builds.md
dominicfallows Jan 22, 2020
c158f55
Update doc-links.yaml
dominicfallows Jan 22, 2020
c5bc1a4
Update scaling-issues.md
dominicfallows Jan 22, 2020
dab1414
Merge branch 'master' into TECS-84-spike-incremental-builds-gatsby
dominicfallows Jan 22, 2020
eeea6a5
Run prettier to fix a test
dominicfallows Jan 22, 2020
d21dbbf
Tighten up check, to prevent `bool` vs `int` error
dominicfallows Jan 22, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions docs/docs/incremental-builds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Incremental Builds
---

Gatsby sources data from multiple sources (CMS, static files like Markdown, databases, APIs etc) and creates an aggregated data set in GraphQL. Currently, each `gatsby build` uses the GraphQL data set and queries to do a complete rebuild of the whole app, including static assets, such as HTML, JavaScript, JSON, and media files etc) from the graphQL queries, ready for deployment. For projects that have a small (10s to 100s) to medium (100s to 1000s) amount of content, these full builds don't present a problem.

Sites with large amounts of content (10,000s upwards 😱) start to see increased build times and increased demand on CPU and memory.

Also, one of the principals of modern Continuous Integration/Continuous Deployment is to release change (and therefore risk) in small batches. A full app rebuild may not be in line with that principle for some projects.

One solution to these problems might be to use [Gatsby Cloud](https://www.gatsbyjs.com/cloud/)'s 'Build' features (currently in [Beta](https://www.gatsbyjs.com/builds-beta/)).

For projects that require self-hosted environments, where Gatsby Cloud would not be an option, being able to **incrementally build** only the content that has changed (or new) would help reduce build times and demand on resources, whilst helping keep in line with CI/CD principles.

For more info on the standard build process please see [overview of the gatsby build process](/docs/overview-of-the-gatsby-build-process/)

## How to use

To enable optional incremental builds, use the environment variable `GATSBY_INCREMENTAL_BUILD=true` in your `gatsby build` command, for example:

`GATSBY_INCREMENTAL_BUILD=true gatsby build`

This will run the Gatsby build process, but only build assets that have changed (or are new) since your last build.

### Reporting what has been built

After an incremental build has completed, you might need to get a list of the assets that have been built, for example, if you want to perform a sync action in your CI/CD pipeline.

To list the paths in the build assets (`public`) folder, you can use one (or both) of the following arguments in your `build` command.

- `--log-pages` outputs the updated paths to the console at the end of the build

```
success Building production JavaScript and CSS bundles - 82.198s
success run queries - 82.762s - 4/4 0.05/s
success Building static HTML for pages - 19.386s - 2/2 0.10/s
+ success Delete previous page data - 1.512s
success Update cache for next build - 1.202s
info Done building in 152.084 sec
+ info Incremental build pages:
+ Updated page: /about
+ Updated page: /accounts/example
+ info Incremental build deleted pages:
+ Deleted page: /test

Done in 154.501 sec
```

- `--write-to-file` creates two files in the `.cache` folder, with lists of the changes paths in the build assets (`public`) folder.

- `newPages.txt` will contain a list of paths that have changed or are new
- `deletedPages.txt` will contain a list of paths that have been deleted

If there are no changed or deleted paths, then the relevant files will not be created in the `.cache` folder.

## Further considerations

- To enable incremental builds you will need to set an environment variable, so you will need access to set variables in your build environment
- You will need to persist the cached `.cache/redux.state` file between builds, allowing for comparison on an incremental build, if there is no `redux.state` file located in the `.cache` the folder then a full build will be triggered
- The root JS bundle will still get generated on incremental builds, you will need to deploy these JS files as well as changed paths
- Any code changes (templates, components, source handling, new plugins etc) will require a full `gatsby build`
4 changes: 4 additions & 0 deletions docs/docs/scaling-issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,7 @@ exports.createSchemaCustomization = ({ actions }) => {
`)
}
```

### Incremental builds

Being able to incrementally build only the content that has changed (or new) would help reduce build times and demand on resources. See [Incremental builds](/docs/incremental-builds/).
12 changes: 10 additions & 2 deletions packages/gatsby/src/bootstrap/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ const removeStaleJobs = require(`./remove-stale-jobs`)
// Add `util.promisify` polyfill for old node versions
require(`util.promisify/shim`)()

const incrementalBuild =
process.env.GATSBY_INCREMENTAL_BUILD === `true` || false

// Show stack trace on unhandled promises.
process.on(`unhandledRejection`, (reason, p) => {
report.panic(reason)
Expand Down Expand Up @@ -190,7 +193,7 @@ module.exports = async (args: BootstrapArgs) => {

// During builds, delete html and css files from the public directory as we don't want
// deleted pages and styles from previous builds to stick around.
if (process.env.NODE_ENV === `production`) {
if (!incrementalBuild && process.env.NODE_ENV === `production`) {
activity = report.activityTimer(
`delete html and css files from previous builds`,
{
Expand Down Expand Up @@ -252,7 +255,12 @@ module.exports = async (args: BootstrapArgs) => {
try {
// Attempt to empty dir if remove fails,
// like when directory is mount point
await fs.remove(cacheDirectory).catch(() => fs.emptyDir(cacheDirectory))
if (!incrementalBuild) {
await fs.remove(cacheDirectory).catch(() => fs.emptyDir(cacheDirectory))
} else {
// Remove all files except the cache file
await del([`${cacheDirectory}/**/*.{json,js,css}`])
}
} catch (e) {
report.error(`Failed to remove .cache files.`, e)
}
Expand Down
81 changes: 70 additions & 11 deletions packages/gatsby/src/commands/build.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* @flow */

const path = require(`path`)
const fs = require(`fs-extra`)
const report = require(`gatsby-cli/lib/reporter`)
const buildHTML = require(`./build-html`)
const buildProductionBundle = require(`./build-javascript`)
Expand All @@ -11,11 +11,12 @@ const { initTracer, stopTracer } = require(`../utils/tracer`)
const db = require(`../db`)
const signalExit = require(`signal-exit`)
const telemetry = require(`gatsby-telemetry`)
const { store, emitter } = require(`../redux`)
const { store, emitter, readState } = require(`../redux`)
const queryUtil = require(`../query`)
const appDataUtil = require(`../utils/app-data`)
const WorkerPool = require(`../utils/worker/pool`)
const { structureWebpackErrors } = require(`../utils/webpack-error-utils`)
const pageDataUtil = require(`../utils/page-data`)
const {
waitUntilAllJobsComplete: waitUntilAllJobsV2Complete,
} = require(`../utils/jobs-manager`)
Expand Down Expand Up @@ -48,6 +49,8 @@ const waitUntilAllJobsComplete = () => {

module.exports = async function build(program: BuildArgs) {
const publicDir = path.join(program.directory, `public`)
const incrementalBuild =
process.env.GATSBY_INCREMENTAL_BUILD === `true` || false
initTracer(program.openTracingConfigFile)
const buildActivity = report.phantomActivity(`build`)
buildActivity.start()
Expand All @@ -68,7 +71,7 @@ module.exports = async function build(program: BuildArgs) {
const {
processPageQueries,
processStaticQueries,
} = queryUtil.getInitialQueryProcessors({
} = await queryUtil.getInitialQueryProcessors({
parentSpan: buildSpan,
})

Expand Down Expand Up @@ -100,7 +103,7 @@ module.exports = async function build(program: BuildArgs) {
const webpackCompilationHash = stats.hash
if (
webpackCompilationHash !== store.getState().webpackCompilationHash ||
!appDataUtil.exists(publicDir)
(!incrementalBuild && !appDataUtil.exists(publicDir))
) {
store.dispatch({
type: `SET_WEBPACK_COMPILATION_HASH`,
Expand Down Expand Up @@ -136,15 +139,14 @@ module.exports = async function build(program: BuildArgs) {
require(`../redux/actions`).boundActionCreators.setProgramStatus(
`BOOTSTRAP_QUERY_RUNNING_FINISHED`
)

await db.saveState()

await waitUntilAllJobsComplete()

// we need to save it again to make sure our latest state has been saved
await db.saveState()

const pagePaths = [...store.getState().pages.keys()]
activity = report.activityTimer(`Building static HTML for pages`, {
parentSpan: buildSpan,
})
const pagePaths = incrementalBuild
? await pageDataUtil.getNewPageKeys(store.getState(), readState())
: [...store.getState().pages.keys()]
activity = report.createProgress(
`Building static HTML for pages`,
pagePaths.length,
Expand Down Expand Up @@ -184,18 +186,75 @@ module.exports = async function build(program: BuildArgs) {
}
activity.done()

let deletedPageKeys = []
if (incrementalBuild) {
activity = report.activityTimer(`Delete previous page data`)
activity.start()
deletedPageKeys = await pageDataUtil.removePreviousPageData(
program.directory,
store.getState(),
readState()
)
activity.end()
}

await apiRunnerNode(`onPostBuild`, {
graphql: graphqlRunner,
parentSpan: buildSpan,
})

// Make sure we saved the latest state so we have all jobs cached
activity = report.activityTimer(`Update cache for next build`, {
parentSpan: buildSpan,
})
activity.start()
await db.saveState()
activity.end()

report.info(`Done building in ${process.uptime()} sec`)

buildSpan.finish()
await stopTracer()
workerPool.end()
buildActivity.end()
if (incrementalBuild && process.argv.indexOf(`--log-pages`) > -1) {
if (pagePaths.length) {
report.info(
`Incremental build pages:\n${pagePaths.map(
path => `Updated page: ${path}\n`
)}`.replace(/,/g, ``)
)
}
if (deletedPageKeys.length) {
report.info(
`Incremental build deleted pages:\n${deletedPageKeys.map(
path => `Deleted page: ${path}\n`
)}`.replace(/,/g, ``)
)
}
}

if (incrementalBuild && process.argv.indexOf(`--write-to-file`) > -1) {
const createdFilesPath = path.resolve(
`${program.directory}/.cache`,
`newPages.txt`
)
const deletedFilesPath = path.resolve(
`${program.directory}/.cache`,
`deletedPages.txt`
)

if (pagePaths.length) {
fs.writeFileSync(createdFilesPath, `${pagePaths.join(`\n`)}\n`, `utf8`)
report.info(`newPages.txt created`)
}
if (deletedPageKeys.length) {
fs.writeFileSync(
deletedFilesPath,
`${deletedPageKeys.join(`\n`)}\n`,
`utf8`
)
report.info(`deletedPages.txt created`)
}
}
}
30 changes: 21 additions & 9 deletions packages/gatsby/src/query/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@
const _ = require(`lodash`)
const Queue = require(`better-queue`)
// const convertHrtime = require(`convert-hrtime`)
const { store, emitter } = require(`../redux`)
const { store, emitter, readState } = require(`../redux`)
const { boundActionCreators } = require(`../redux/actions`)
const report = require(`gatsby-cli/lib/reporter`)
const queryQueue = require(`./queue`)
const GraphQLRunner = require(`./graphql-runner`)
const pageDataUtil = require(`../utils/page-data`)

const seenIdsWithoutDataDependencies = new Set()
let queuedDirtyActions = []
Expand Down Expand Up @@ -148,10 +149,21 @@ const calcInitialDirtyQueryIds = state => {
/**
* groups queryIds by whether they are static or page queries.
*/
const groupQueryIds = queryIds => {
const groupQueryIds = async queryIds => {
const incrementalBuild =
process.env.GATSBY_INCREMENTAL_BUILD === `true` || false
const grouped = _.groupBy(queryIds, p =>
p.slice(0, 4) === `sq--` ? `static` : `page`
)

if (incrementalBuild) {
const newPageKeys = await pageDataUtil.getNewPageKeys(
store.getState(),
readState()
)
grouped.page = newPageKeys
}

return {
staticQueryIds: grouped.static || [],
pageQueryIds: grouped.page || [],
Expand Down Expand Up @@ -219,10 +231,10 @@ const processPageQueries = async (queryIds, { state, activity }) => {
)
}

const getInitialQueryProcessors = ({ parentSpan } = {}) => {
const getInitialQueryProcessors = async ({ parentSpan } = {}) => {
const state = store.getState()
const queryIds = calcInitialDirtyQueryIds(state)
const { staticQueryIds, pageQueryIds } = groupQueryIds(queryIds)
const { staticQueryIds, pageQueryIds } = await groupQueryIds(queryIds)

const queryjobsCount =
_.filter(pageQueryIds.map(id => state.pages.get(id))).length +
Expand Down Expand Up @@ -256,7 +268,7 @@ const initialProcessQueries = async ({ parentSpan } = {}) => {
pageQueryIds,
processPageQueries,
processStaticQueries,
} = getInitialQueryProcessors({ parentSpan })
} = await getInitialQueryProcessors({ parentSpan })

await processStaticQueries()
await processPageQueries()
Expand Down Expand Up @@ -290,10 +302,10 @@ let listenerQueue
* Run any dirty queries. See `calcQueries` for what constitutes a
* dirty query
*/
const runQueuedQueries = () => {
const runQueuedQueries = async () => {
if (listenerQueue) {
const state = store.getState()
const { staticQueryIds, pageQueryIds } = groupQueryIds(
const { staticQueryIds, pageQueryIds } = await groupQueryIds(
calcDirtyQueryIds(state)
)
const pages = _.filter(pageQueryIds.map(id => state.pages.get(id)))
Expand All @@ -317,7 +329,7 @@ const runQueuedQueries = () => {
* For what constitutes a dirty query, see `calcQueries`
*/

const startListeningToDevelopQueue = () => {
const startListeningToDevelopQueue = async () => {
// We use a queue to process batches of queries so that they are
// processed consecutively
let graphqlRunner = null
Expand Down Expand Up @@ -345,7 +357,7 @@ const startListeningToDevelopQueue = () => {
report.pendingActivity({ id: `query-running` })
})

emitter.on(`API_RUNNING_QUEUE_EMPTY`, runQueuedQueries)
emitter.on(`API_RUNNING_QUEUE_EMPTY`, await runQueuedQueries)
;[
`DELETE_CACHE`,
`CREATE_NODE`,
Expand Down
2 changes: 1 addition & 1 deletion packages/gatsby/src/redux/__tests__/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,6 @@ describe(`redux db`, () => {
expect(data.components).not.toEqual(initialComponentsState)

// yuck - loki and redux will have different shape of redux state (nodes and nodesByType)
expect(_.omit(data, [`nodes`, `nodesByType`])).toMatchSnapshot()
expect(_.omit(data, [`nodes`, `nodesByType`, `pages`])).toMatchSnapshot()
})
})
8 changes: 7 additions & 1 deletion packages/gatsby/src/redux/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,20 @@ export const configureStore = (initialState: IReduxState): Store<IReduxState> =>
applyMiddleware(thunk, multi)
)

export const store = configureStore(readState())
const initialState = readState()
// Page data is not required to be in the initial redux store.
// This will enable us to make a comparison of the cached state and new state.
// Allowing us to add and delete pages.
initialState.pages = new Map()
export const store = configureStore(initialState) // Persist state.

// Persist state.
export const saveState = (): void => {
const state = store.getState()
const pickedState = _.pick(state, [
`nodes`,
`status`,
`pages`,
`componentDataDependencies`,
`components`,
`jobsV2`,
Expand Down
1 change: 1 addition & 0 deletions packages/gatsby/src/redux/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ export interface IReduxState {
developMiddleware: any
proxy: any
}
pages?: any
}

export enum ProgramStatus {
Expand Down
Loading