Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gatsby): refresh browser if webpack rebuild occurs #13871

Merged
merged 14 commits into from
May 17, 2019

Conversation

Moocar
Copy link
Contributor

@Moocar Moocar commented May 5, 2019

Note: merges to per-page-manifest, not master. See #13004 for more info

Description

The is the final new piece of functionality in #13004. It covers the case where a user is browsing the site, and a rebuild occurs in the background. In the old data.json world, the user would never see the new changes until they manually refreshed, since everything is statically linked. But in the de-globalized approach, the user might click a link after a rebuild occurs and get a page-data.json that is incompatible with the browser's component code.

To avoid this, we save the webpack stats.hash which is a hash of the entire webpack build. So whenever any source code in the site changes, there will be a new hash. This is then added to every page-data and html file, so that when the browser loads a new page, it can compare the html's hash to the page's hash, and force a refresh if they're different.

Since the page-datas now rely on the webpack compilation hash, I've moved query running out of bootstrap and into build.js so that the page querys can be run after the javascript bundle has been built.

Related Issues

@Moocar
Copy link
Contributor Author

Moocar commented May 5, 2019

@KyleAMathews @pieh @wardpeet This is the last big PR in the series. Would love a review when you get a chance.

I'm looking into the e2e_tests_production_runtime test. They're all passing on my machine (TM) so not sure what's going on yet.

@Moocar Moocar requested review from KyleAMathews, pieh and wardpeet May 5, 2019 22:56
@Moocar
Copy link
Contributor Author

Moocar commented May 5, 2019

Ahh, I see now, it's the yarn test:offline variant. Thank goodness for these integration tests.

@Moocar
Copy link
Contributor Author

Moocar commented May 7, 2019

ok, tests fixed. I was stuck by the interdependencies between 1-production.js and the rest of the cypress tests in e2e-tests/production-runtime. Should be working now. @davidbailey00 would you mind having a look at the code around resetting the whitelist in navigation.js? Not sure if this code should be here or in the offline plugin. Also, I know you have some WIP to improve the offline plugin, so feel free to rewrite all this stuff after it's merged.

Copy link
Contributor

@DSchau DSchau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work with this functionality and updating the e2e tests ⭐️ 🚀

It looks good to my eyes, but would appreciate a second review from @wardpeet and/or @pieh since there's a fair amount of surface area, here. That being said--the e2e test helps out, and will continue to help out going forward 👌

@DSchau DSchau added the status: awaiting reviewer response A pull request that is currently awaiting a reviewer's response label May 14, 2019
Copy link
Contributor

@KyleAMathews KyleAMathews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic!

This is a huge improvement over what we have now as people will now always be loading the latest data and code where currently people can browse around a site seeing old data and running out-dated code. <3

pagePaths,
webpackCompilationHash
) => {
const workerPool = new Worker(require.resolve(`./page-data-worker`), {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised a big that this makes things faster? What was the difference? Node does use a thread pool for FS. Hmmm looking at this https://stackoverflow.com/questions/20346097/does-node-js-use-threads-thread-pool-internally there is limits so for mass updates I can see the limits on fs threads causing trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I haven't actually run benchmarks on this, but my thinking is that the JSON deserialization/serialization is computationally expensive, so running that work on multiple workers will result in a performance increase. Especially since no data needs to be passed between processes (except the file path). Basically, this is a CPU optimization, not an I/O one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah — seems reasonable — does this reuse the workers for rendering pages? It'd be nice not to spawn extra Node.js processes if we don't have to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a different worker pool. Interesting idea on sharing them. The challenge is that we'd have to write a new worker.js that references the code for both html rendering and this code, and then export the functions for each. That's totally doable, but might add a bit too much complexity for now. I'd say that if we use jest-worker in any other parts of gatsby, it might make sense to consolidate at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds a lot like a "job" api that handles things to a worker pool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Moocar we'd just have to merge the two files right and have different exports for the two? Am I missing something? Doesn't seem too complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KyleAMathews Yep, and then make sure we start and end the worker pool at the right place. More than happy to code this up for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's do that — this will make things more efficient + it'll make it really easy to add the worker pool to other tasks as they come up as we'll have the basic infra in place + since the workers are already spun up, sending task to them is really cheap.

@KyleAMathews
Copy link
Contributor

@Moocar lemme know about my question but otherwise I'll merge this in later.

@DSchau DSchau changed the title Refresh browser if webpack rebuild occurs fix(gatsby): refresh browser if webpack rebuild occurs May 16, 2019
@Moocar Moocar force-pushed the add-compilation-hash branch from 868fdd3 to ed4a42a Compare May 16, 2019 06:34
@Moocar
Copy link
Contributor Author

Moocar commented May 16, 2019

Hmm, not sure about that production-runtime error. Can't reproduce it locally.

Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this! 😄 great work, one small nit about a comment

pagePaths,
webpackCompilationHash
) => {
const workerPool = new Worker(require.resolve(`./page-data-worker`), {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds a lot like a "job" api that handles things to a worker pool.

@wardpeet
Copy link
Contributor

@Moocar maybe just merge master? Didn't want to use the bot to make sure I don't break anything.

Circle ci:

Your plugins must export known APIs from their gatsby-browser.js.
The following exports aren't APIs. Perhaps you made a typo or your plugin is outdated?

See https://www.gatsbyjs.org/docs/browser-apis/ for the list of Gatsby Browser APIs

  • The plugin "gatsby-plugin-offline@2.1.0" is exporting a variable named "onPostPrefetchPathname" which isn't an API.

@KyleAMathews KyleAMathews merged commit efca3e8 into gatsbyjs:per-page-manifest May 17, 2019
@KyleAMathews
Copy link
Contributor

:shipit:

Moocar added a commit to Moocar/gatsby that referenced this pull request May 20, 2019
* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool
Moocar added a commit to Moocar/gatsby that referenced this pull request May 24, 2019
* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool
Moocar added a commit to Moocar/gatsby that referenced this pull request May 27, 2019
* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool
Moocar added a commit to Moocar/gatsby that referenced this pull request Jun 2, 2019
* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool
Moocar added a commit to Moocar/gatsby that referenced this pull request Jun 11, 2019
* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool
KyleAMathews pushed a commit that referenced this pull request Jun 11, 2019
* feat(gatsby): Page data without compilation hash (#13139)

**Note: merges to `per-page-manifest`, not master. See #13004 for more info**

## Description

This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc.

It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim.

Compilation-hash will be added in future PRs.

## Related Issues

- Sub-PR of #13004

* Websocket manager use page data (#13389)

* add utils/page-data.read

* read websocket page data from utils/page-data

* Loader use page data (#13409)

* page-data loader working for production-app

* get new loader.getPage stuff working with gatsby develop

* fix static-entry.js test

* remove loadPageDataSync (will be in next PR)

* use array of matchPaths

* Deprecate various loader methods

* remove console.log

* document why fetchPageData needs to check that response is JSON

* in offline, use prefetchedResources.push(...resourceUrls)

* root.js remove else block

* loader.js make* -> create*

* loader drop else block

* pass correct path/resourceUrls to onPostPrefetch

* s/err => null/() => null/

* Extract loadComponent from loadPage

* remove pageData from window object

* update jest snapshots for static-entry (to use window.pagePath)

* add loadPageOr404

* preload 404 page in background

* normalize page paths

* comment out resource-loading-resilience.js (will fix later)

* Remove data json from guess (#13727)

* remove data.json from plugin-guess

* add test for plugin-guess

* Gatsby serve use page data (#13728)

* use match-paths.json for gatbsy serve

* remove pages.json

* move query/pages-writer to bootstrap/requires-writer (#13729)

And delete data.json

* fix(gatsby): refresh browser if webpack rebuild occurs (#13871)

* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool

* page-data.json cleanup PR. Remove jsonName and dataPath (#14167)

* remove json-name and don't save /static/d/ page query results

* in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__

* loader getPage -> loadPageSync (#14264)

* Page data loading resilience (#14286)

* fetchPageHtml if page resources aren't found

* add page-data to production-runtime/resource-loading-resilience test

Also use cypress tasks for blocking resources instead of npm run chunks

* fetchPageHtml -> doesPageHtmlExist

* remove loadPageOr404Sync

* revert plugin-offline to master (#14385)

* Misc per-page-manifest fixes (#14413)

* use path.join for static-entry page-data path

* add pathContext back to static-entry props (fix regression)

* Remove jsonDataPaths from pre-existing redux state (#14501)

* Add context to sentry xhr errors (#14577)

* Add extra context to Sentry xhr errors

* Don't define Sentry immediately as it'll always be undefined then

* mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge)

* remove sentry reporting
mxxk pushed a commit to mxxk/gatsby that referenced this pull request Jun 21, 2019
* feat(gatsby): Page data without compilation hash (gatsbyjs#13139)

**Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info**

## Description

This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc.

It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim.

Compilation-hash will be added in future PRs.

## Related Issues

- Sub-PR of gatsbyjs#13004

* Websocket manager use page data (gatsbyjs#13389)

* add utils/page-data.read

* read websocket page data from utils/page-data

* Loader use page data (gatsbyjs#13409)

* page-data loader working for production-app

* get new loader.getPage stuff working with gatsby develop

* fix static-entry.js test

* remove loadPageDataSync (will be in next PR)

* use array of matchPaths

* Deprecate various loader methods

* remove console.log

* document why fetchPageData needs to check that response is JSON

* in offline, use prefetchedResources.push(...resourceUrls)

* root.js remove else block

* loader.js make* -> create*

* loader drop else block

* pass correct path/resourceUrls to onPostPrefetch

* s/err => null/() => null/

* Extract loadComponent from loadPage

* remove pageData from window object

* update jest snapshots for static-entry (to use window.pagePath)

* add loadPageOr404

* preload 404 page in background

* normalize page paths

* comment out resource-loading-resilience.js (will fix later)

* Remove data json from guess (gatsbyjs#13727)

* remove data.json from plugin-guess

* add test for plugin-guess

* Gatsby serve use page data (gatsbyjs#13728)

* use match-paths.json for gatbsy serve

* remove pages.json

* move query/pages-writer to bootstrap/requires-writer (gatsbyjs#13729)

And delete data.json

* fix(gatsby): refresh browser if webpack rebuild occurs (gatsbyjs#13871)

* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool

* page-data.json cleanup PR. Remove jsonName and dataPath (gatsbyjs#14167)

* remove json-name and don't save /static/d/ page query results

* in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__

* loader getPage -> loadPageSync (gatsbyjs#14264)

* Page data loading resilience (gatsbyjs#14286)

* fetchPageHtml if page resources aren't found

* add page-data to production-runtime/resource-loading-resilience test

Also use cypress tasks for blocking resources instead of npm run chunks

* fetchPageHtml -> doesPageHtmlExist

* remove loadPageOr404Sync

* revert plugin-offline to master (gatsbyjs#14385)

* Misc per-page-manifest fixes (gatsbyjs#14413)

* use path.join for static-entry page-data path

* add pathContext back to static-entry props (fix regression)

* Remove jsonDataPaths from pre-existing redux state (gatsbyjs#14501)

* Add context to sentry xhr errors (gatsbyjs#14577)

* Add extra context to Sentry xhr errors

* Don't define Sentry immediately as it'll always be undefined then

* mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge)

* remove sentry reporting
johno pushed a commit to johno/gatsby that referenced this pull request Jul 17, 2019
* feat(gatsby): Page data without compilation hash (gatsbyjs#13139)

**Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info**

## Description

This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc.

It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim.

Compilation-hash will be added in future PRs.

## Related Issues

- Sub-PR of gatsbyjs#13004

* Websocket manager use page data (gatsbyjs#13389)

* add utils/page-data.read

* read websocket page data from utils/page-data

* Loader use page data (gatsbyjs#13409)

* page-data loader working for production-app

* get new loader.getPage stuff working with gatsby develop

* fix static-entry.js test

* remove loadPageDataSync (will be in next PR)

* use array of matchPaths

* Deprecate various loader methods

* remove console.log

* document why fetchPageData needs to check that response is JSON

* in offline, use prefetchedResources.push(...resourceUrls)

* root.js remove else block

* loader.js make* -> create*

* loader drop else block

* pass correct path/resourceUrls to onPostPrefetch

* s/err => null/() => null/

* Extract loadComponent from loadPage

* remove pageData from window object

* update jest snapshots for static-entry (to use window.pagePath)

* add loadPageOr404

* preload 404 page in background

* normalize page paths

* comment out resource-loading-resilience.js (will fix later)

* Remove data json from guess (gatsbyjs#13727)

* remove data.json from plugin-guess

* add test for plugin-guess

* Gatsby serve use page data (gatsbyjs#13728)

* use match-paths.json for gatbsy serve

* remove pages.json

* move query/pages-writer to bootstrap/requires-writer (gatsbyjs#13729)

And delete data.json

* fix(gatsby): refresh browser if webpack rebuild occurs (gatsbyjs#13871)

* move query running into build and develop

* save build compilation hash

* write compilationHash to page data

* reload browser if rebuild occurs in background

* add test to ensure that browser is reloaded if rebuild occurs

* update page-datas when compilation hash changes

* use worker pool to udpate page data compilation hash

* update tests snapshot

* reset plugin offline whitelist if compilation hash changes

* prettier: remove global Cypress

* separate page for testing compilation-hash

* fix case typo

* mock out static entry test webpackCompilationHash field

* consolidate jest-worker calls into a central worker pool

* page-data.json cleanup PR. Remove jsonName and dataPath (gatsbyjs#14167)

* remove json-name and don't save /static/d/ page query results

* in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__

* loader getPage -> loadPageSync (gatsbyjs#14264)

* Page data loading resilience (gatsbyjs#14286)

* fetchPageHtml if page resources aren't found

* add page-data to production-runtime/resource-loading-resilience test

Also use cypress tasks for blocking resources instead of npm run chunks

* fetchPageHtml -> doesPageHtmlExist

* remove loadPageOr404Sync

* revert plugin-offline to master (gatsbyjs#14385)

* Misc per-page-manifest fixes (gatsbyjs#14413)

* use path.join for static-entry page-data path

* add pathContext back to static-entry props (fix regression)

* Remove jsonDataPaths from pre-existing redux state (gatsbyjs#14501)

* Add context to sentry xhr errors (gatsbyjs#14577)

* Add extra context to Sentry xhr errors

* Don't define Sentry immediately as it'll always be undefined then

* mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge)

* remove sentry reporting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: awaiting reviewer response A pull request that is currently awaiting a reviewer's response
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants