Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Replace global data.json with page-data.json per page #13004

Closed
wants to merge 44 commits into from

Conversation

Moocar
Copy link
Contributor

@Moocar Moocar commented Apr 1, 2019

Description

Removes the global data.json file and instead provides a page-data.json for each page on a gatsby site. See #11982 for background.

This PR is a draft and should not be merged. I'll be providing a series of smaller PRs instead. Reviewers can always reference this PR to see the full context of the changes. That said, if you're reading this PR and spot any problems, then please do leave some feedback!

Benefits/Trade offs

pros

  • no more downloading data.json to the browser. For 10'000 page sites, this file was already approaching 500kb compressed.
  • navigating to prefetched pages should be faster since there's no data.json to search through
  • production-app.js webpack build no longer references pages or their data. Which will make incremental builds much easier to implement in the future

cons

  • Requesting a page from the server now requires 2 requests. First page-data.json, then the component. Though prefetched pages still do this in the background.
  • Since page data isn't immutable anymore, we can't rely on infinite browser caching
  • Some potential breaking changes in loader.getResourceURLsForPathname

Sup-PRs

Breaking up these PRs is hard. To make it a bit easier, I'm going to have gatsby saving two sources of data to disk. data.json and static/d/..., as well as the new page-data.json. This will require duplication of memory and disk, so we won't be able to merge to master. But we will be able to gradually introduce new functionality, and once it's ready, delete the old data sources and merge to master. I'm not a fan of long lived non-master branches, but I think it makes sense here.

The long lived branch is per-page-manifest. The PRs that will merge directly into that are:

TODO

TODO but won't block release

Already reviewed and Merged:

Merged against per-page-manifest

Related Issues

Moocar added 30 commits April 2, 2019 07:50
cleanup bootstrap

moved page-query-runner to index

use finishBootstrap()

fix query-runner type annotation

refactor query-runner/index

doc sections

queryjobs refactor

make query refactor

enqueueQueryID -> enqueueExtractedQueryId
Also moved pages-writer.js and redirects-writer.js to src/bootstrap
@me4502
Copy link
Contributor

me4502 commented May 24, 2019

It was on 96 for performance, but the main win is in the Performance metrics

image

These are now all ticks, and the diagnostics section is empty.

@KyleAMathews
Copy link
Contributor

KyleAMathews commented May 24, 2019 via email

@me4502
Copy link
Contributor

me4502 commented May 24, 2019

Actually yeah, 15 minutes down to 9. And I except it to have massively lowered deploy times due to less changed files. I haven't ran multiple deploys yet to test though.

@KyleAMathews
Copy link
Contributor

@me4502 oh wow! That's kinda surprising actually — where'd the 6 minutes get saved?

Moocar added a commit to Moocar/gatsby that referenced this pull request May 27, 2019
**Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info**

## Description

This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc.

It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim.

Compilation-hash will be added in future PRs.

## Related Issues

- Sub-PR of gatsbyjs#13004
@Moocar Moocar mentioned this pull request May 27, 2019
@Moocar
Copy link
Contributor Author

Moocar commented May 27, 2019

OK, we're ready to finally merge per-page-manifest into master. I just have one concern before we do. I've changed an official browser API onPostPrefetchPathname -> onPostPrefetch. It's only used by the gatsby-plugin-offline plugin (that we know of). So if we merge per-page-manifest to master, publish, and then a site upgrades gatsby but not gatsby-plugin-offline, they'll get an error:

Your plugins must export known APIs from their gatsby-browser.js.
The following exports aren't APIs. Perhaps you made a typo or your plugin is outdated?

See https://www.gatsbyjs.org/docs/browser-apis/ for the list of Gatsby Browser APIs

- The plugin \"gatsby-plugin-offline@2.1.1\" is exporting a variable named \"onPostPrefetchPathname\" which isn't an API.

Perhaps you meant to export \"onPrefetchPathname\"?
error Command failed with exit code 1

Basically it's a breaking change. And we need users to update both gatsby and gatsby-plugin-offline at the same time. Which to my understanding isn't possible (but happy to be shown otherwise). I'll look at how much work it will be to make it backwards compatible.

@me4502
Copy link
Contributor

me4502 commented May 27, 2019

@KyleAMathews Seems to just be generally all over the build:
Old:

[2019-05-20T03:34:23.683Z] success open and validate gatsby-configs — 0.015 s
[2019-05-20T03:34:23.944Z] success load plugins — 4.015 s
[2019-05-20T03:34:23.944Z] success onPreInit — 0.006 s
[2019-05-20T03:34:23.944Z] success delete html and css files from previous builds — 0.007 s
[2019-05-20T03:34:23.944Z] success initialize cache — 0.009 s
[2019-05-20T03:34:23.944Z] success start nodes db — 0.005 s
[2019-05-20T03:34:24.203Z] success copy gatsby files — 0.042 s
[2019-05-20T03:34:24.203Z] success onPreBootstrap — 0.014 s
[2019-05-20T03:34:24.203Z] Starting to fetch data from Prismic
[2019-05-20T03:34:24.462Z] Fetch Prismic data: 387.842ms
[2019-05-20T03:35:03.976Z] success source and transform nodes — 35.679 s
[2019-05-20T03:35:03.976Z] success building schema — 1.061 s
[2019-05-20T03:38:25.535Z] success createPages — 193.127 s
[2019-05-20T03:38:25.536Z] success createPagesStatefully — 0.075 s
[2019-05-20T03:38:25.536Z] success onPreExtractQueries — 0.027 s
[2019-05-20T03:38:25.536Z] success update schema — 0.000 s
[2019-05-20T03:38:25.536Z] success extract queries from components — 0.389 s
[2019-05-20T03:38:25.536Z] success run static queries — 0.099 s — 3/3 30.93 queries/second
[2019-05-20T03:39:31.986Z] success run page queries — 75.684 s — 55490/55490 736.00 queries/second
[2019-05-20T03:39:33.397Z] success write out page data — 2.811 s
[2019-05-20T03:39:33.397Z] success write out redirect data — 0.001 s
[2019-05-20T03:42:10.295Z] 
[2019-05-20T03:42:10.295Z] success Build manifest and related icons — 0.257 s
[2019-05-20T03:42:10.295Z] success onPostBootstrap — 0.260 s
[2019-05-20T03:42:10.295Z] 
[2019-05-20T03:42:10.295Z] info bootstrap finished - 467.029 s
[2019-05-20T03:42:10.295Z] 
[2019-05-20T03:43:47.136Z] success Building production JavaScript and CSS bundles — 76.443 s
[2019-05-20T03:47:23.755Z] success Building static HTML for pages — 210.809 s — 55490/55490 276.87 pages/second
[2019-05-20T03:47:23.755Z] Generated public/sw.js, which will precache 12 files, totaling 402207 bytes.
[2019-05-20T03:47:23.755Z] info Done building in 782.100299522 sec
[2019-05-20T03:47:23.755Z] Done in 841.10s.
[2019-05-20T03:47:23.755Z] Done in 841.64s.

New:

[2019-05-24T04:15:58.931Z] success open and validate gatsby-configs — 0.010 s
[2019-05-24T04:16:03.113Z] success load plugins — 4.288 s
[2019-05-24T04:16:03.113Z] success onPreInit — 0.005 s
[2019-05-24T04:16:03.113Z] success delete html and css files from previous builds — 0.006 s
[2019-05-24T04:16:03.113Z] success initialize cache — 0.008 s
[2019-05-24T04:16:03.113Z] success start nodes db — 0.004 s
[2019-05-24T04:16:03.113Z] success copy gatsby files — 0.033 s
[2019-05-24T04:16:03.113Z] success onPreBootstrap — 0.014 s
[2019-05-24T04:16:03.113Z] Starting to fetch data from Prismic
[2019-05-24T04:16:03.371Z] Fetch Prismic data: 391.532ms
[2019-05-24T04:16:36.213Z] success source and transform nodes — 29.421 s
[2019-05-24T04:16:36.213Z] warning Deprecation warning - "noDefaultResolvers" is deprecated. In Gatsby 3, defined fields won't get resolvers, unless explicitly added with a directive/extension.
[2019-05-24T04:16:36.213Z] warning Deprecation warning - "noDefaultResolvers" is deprecated. In Gatsby 3, defined fields won't get resolvers, unless explicitly added with a directive/extension.
[2019-05-24T04:16:36.213Z] warning Deprecation warning - "noDefaultResolvers" is deprecated. In Gatsby 3, defined fields won't get resolvers, unless explicitly added with a directive/extension.
[2019-05-24T04:16:36.213Z] warning Deprecation warning - "noDefaultResolvers" is deprecated. In Gatsby 3, defined fields won't get resolvers, unless explicitly added with a directive/extension.
[2019-05-24T04:16:36.213Z] warning Deprecation warning - "noDefaultResolvers" is deprecated. In Gatsby 3, defined fields won't get resolvers, unless explicitly added with a directive/extension.
[2019-05-24T04:16:36.213Z] warning Deprecation warning - "noDefaultResolvers" is deprecated. In Gatsby 3, defined fields won't get resolvers, unless explicitly added with a directive/extension.
[2019-05-24T04:16:36.213Z] success building schema — 0.507 s
[2019-05-24T04:19:27.717Z] success createPages — 163.235 s
[2019-05-24T04:19:27.717Z] success createPagesStatefully — 0.071 s
[2019-05-24T04:19:27.717Z] success onPreExtractQueries — 0.004 s
[2019-05-24T04:19:27.717Z] success update schema — 0.000 s
[2019-05-24T04:19:27.717Z] success extract queries from components — 0.376 s
[2019-05-24T04:19:27.717Z] success write out requires — 0.291 s
[2019-05-24T04:19:27.717Z] success write out redirect data — 0.001 s
[2019-05-24T04:19:27.717Z] success Build manifest and related icons — 0.117 s
[2019-05-24T04:19:27.717Z] success onPostBootstrap — 0.119 s
[2019-05-24T04:19:27.717Z] info bootstrap finished - 200.574 s
[2019-05-24T04:19:27.717Z] success run static queries — 0.011 s — 2/2 194.43 queries/second
[2019-05-24T04:19:35.850Z] success Building production JavaScript and CSS bundles — 17.572 s
[2019-05-24T04:19:36.108Z] success Rewriting compilation hashes — 0.066 s
[2019-05-24T04:21:42.166Z] success run page queries — 122.355 s — 55588/55588 454.93 queries/second
[2019-05-24T04:24:40.233Z] success Building static HTML for pages — 157.017 s — 55588/55588 381.18 pages/second
[2019-05-24T04:24:40.233Z] Generated public/sw.js, which will precache 12 files, totaling 404337 bytes.
[2019-05-24T04:24:40.233Z] info Done building in 515.287176085 sec
[2019-05-24T04:24:40.233Z] Done in 578.07s.
[2019-05-24T04:24:40.233Z] Done in 578.55s.

@KyleAMathews
Copy link
Contributor

KyleAMathews commented May 28, 2019 via email

@Moocar
Copy link
Contributor Author

Moocar commented May 29, 2019

OK, I'm satisfied again that this is ready to merge to master. I've published a new gatsby@per-page-manifest.

I've created the PR to merge per-page-manifest into master. #14359

I'm also running https://github.com/pieh/develop-runner to make sure that a bunch of sites at least build.

Moocar added a commit to Moocar/gatsby that referenced this pull request Jun 2, 2019
**Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info**

## Description

This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc.

It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim.

Compilation-hash will be added in future PRs.

## Related Issues

- Sub-PR of gatsbyjs#13004
@Moocar
Copy link
Contributor Author

Moocar commented Jun 4, 2019

Hey folks, I'm closing this PR, as the actual code to be merged to master is on #14359 (branch per-page-manifest). Please comment there from now on.

@Moocar Moocar closed this Jun 4, 2019
Moocar added a commit to Moocar/gatsby that referenced this pull request Jun 11, 2019
**Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info**

## Description

This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc.

It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim.

Compilation-hash will be added in future PRs.

## Related Issues

- Sub-PR of gatsbyjs#13004
KyleAMathews pushed a commit that referenced this pull request Jun 11, 2019
* feat(gatsby): Page data without compilation hash (#13139)

**Note: merges to `per-page-manifest`, not master. See #13004 for more info**

## Description

This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc.

It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim.

Compilation-hash will be added in future PRs.

## Related Issues

- Sub-PR of #13004

* Websocket manager use page data (#13389)

* add utils/page-data.read

* read websocket page data from utils/page-data

* Loader use page data (#13409)

* page-data loader working for production-app

* get new loader.getPage stuff working with gatsby develop

* fix static-entry.js test

* remove loadPageDataSync (will be in next PR)

* use array of matchPaths

* Deprecate various loader methods

* remove console.log

* document why fetchPageData needs to check that response is JSON

* in offline, use prefetchedResources.push(...resourceUrls)

* root.js remove else block

* loader.js make* -> create*

* loader drop else block

* pass correct path/resourceUrls to onPostPrefetch

* s/err => null/() => null/

* Extract loadComponent from loadPage

* remove pageData from window object

* update jest snapshots for static-entry (to use window.pagePath)

* add loadPageOr404

* preload 404 page in background

* normalize page paths

* comment out resource-loading-resilience.js (will fix later)

* Remove data json from guess (#13727)

* remove data.json from plugin-guess

* add test for plugin-guess

* Gatsby serve use page data (#13728)

* use match-paths.json for gatbsy serve

* remove pages.json

* move query/pages-writer to bootstrap/requires-writer (#13729)

And delete data.json

* fix(gatsby): refresh browser if webpack rebuild occurs (#13871)

* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool

* page-data.json cleanup PR. Remove jsonName and dataPath (#14167)

* remove json-name and don't save /static/d/ page query results

* in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__

* loader getPage -> loadPageSync (#14264)

* Page data loading resilience (#14286)

* fetchPageHtml if page resources aren't found

* add page-data to production-runtime/resource-loading-resilience test

Also use cypress tasks for blocking resources instead of npm run chunks

* fetchPageHtml -> doesPageHtmlExist

* remove loadPageOr404Sync

* revert plugin-offline to master (#14385)

* Misc per-page-manifest fixes (#14413)

* use path.join for static-entry page-data path

* add pathContext back to static-entry props (fix regression)

* Remove jsonDataPaths from pre-existing redux state (#14501)

* Add context to sentry xhr errors (#14577)

* Add extra context to Sentry xhr errors

* Don't define Sentry immediately as it'll always be undefined then

* mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge)

* remove sentry reporting
@KyleAMathews
Copy link
Contributor

For those following along here, this PR got merged via #14359 and released as 2.9.0 🎉 Thanks @Moocar for the great analysis and code and @wardpeet for all your reviews!

@leonfs
Copy link

leonfs commented Jun 11, 2019

Thank-you everyone!! amazing job!!

mxxk pushed a commit to mxxk/gatsby that referenced this pull request Jun 21, 2019
* feat(gatsby): Page data without compilation hash (gatsbyjs#13139)

**Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info**

## Description

This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc.

It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim.

Compilation-hash will be added in future PRs.

## Related Issues

- Sub-PR of gatsbyjs#13004

* Websocket manager use page data (gatsbyjs#13389)

* add utils/page-data.read

* read websocket page data from utils/page-data

* Loader use page data (gatsbyjs#13409)

* page-data loader working for production-app

* get new loader.getPage stuff working with gatsby develop

* fix static-entry.js test

* remove loadPageDataSync (will be in next PR)

* use array of matchPaths

* Deprecate various loader methods

* remove console.log

* document why fetchPageData needs to check that response is JSON

* in offline, use prefetchedResources.push(...resourceUrls)

* root.js remove else block

* loader.js make* -> create*

* loader drop else block

* pass correct path/resourceUrls to onPostPrefetch

* s/err => null/() => null/

* Extract loadComponent from loadPage

* remove pageData from window object

* update jest snapshots for static-entry (to use window.pagePath)

* add loadPageOr404

* preload 404 page in background

* normalize page paths

* comment out resource-loading-resilience.js (will fix later)

* Remove data json from guess (gatsbyjs#13727)

* remove data.json from plugin-guess

* add test for plugin-guess

* Gatsby serve use page data (gatsbyjs#13728)

* use match-paths.json for gatbsy serve

* remove pages.json

* move query/pages-writer to bootstrap/requires-writer (gatsbyjs#13729)

And delete data.json

* fix(gatsby): refresh browser if webpack rebuild occurs (gatsbyjs#13871)

* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool

* page-data.json cleanup PR. Remove jsonName and dataPath (gatsbyjs#14167)

* remove json-name and don't save /static/d/ page query results

* in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__

* loader getPage -> loadPageSync (gatsbyjs#14264)

* Page data loading resilience (gatsbyjs#14286)

* fetchPageHtml if page resources aren't found

* add page-data to production-runtime/resource-loading-resilience test

Also use cypress tasks for blocking resources instead of npm run chunks

* fetchPageHtml -> doesPageHtmlExist

* remove loadPageOr404Sync

* revert plugin-offline to master (gatsbyjs#14385)

* Misc per-page-manifest fixes (gatsbyjs#14413)

* use path.join for static-entry page-data path

* add pathContext back to static-entry props (fix regression)

* Remove jsonDataPaths from pre-existing redux state (gatsbyjs#14501)

* Add context to sentry xhr errors (gatsbyjs#14577)

* Add extra context to Sentry xhr errors

* Don't define Sentry immediately as it'll always be undefined then

* mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge)

* remove sentry reporting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split page metadata so can lazy load it and reduce the initial JS