-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gatsby): refresh browser if webpack rebuild occurs #13871
fix(gatsby): refresh browser if webpack rebuild occurs #13871
Conversation
@KyleAMathews @pieh @wardpeet This is the last big PR in the series. Would love a review when you get a chance. I'm looking into the |
Ahh, I see now, it's the |
ok, tests fixed. I was stuck by the interdependencies between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic!
This is a huge improvement over what we have now as people will now always be loading the latest data and code where currently people can browse around a site seeing old data and running out-dated code. <3
pagePaths, | ||
webpackCompilationHash | ||
) => { | ||
const workerPool = new Worker(require.resolve(`./page-data-worker`), { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised a big that this makes things faster? What was the difference? Node does use a thread pool for FS. Hmmm looking at this https://stackoverflow.com/questions/20346097/does-node-js-use-threads-thread-pool-internally there is limits so for mass updates I can see the limits on fs threads causing trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I haven't actually run benchmarks on this, but my thinking is that the JSON deserialization/serialization is computationally expensive, so running that work on multiple workers will result in a performance increase. Especially since no data needs to be passed between processes (except the file path). Basically, this is a CPU optimization, not an I/O one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah — seems reasonable — does this reuse the workers for rendering pages? It'd be nice not to spawn extra Node.js processes if we don't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's a different worker pool. Interesting idea on sharing them. The challenge is that we'd have to write a new worker.js
that references the code for both html rendering and this code, and then export the functions for each. That's totally doable, but might add a bit too much complexity for now. I'd say that if we use jest-worker in any other parts of gatsby, it might make sense to consolidate at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds a lot like a "job" api that handles things to a worker pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Moocar we'd just have to merge the two files right and have different exports for the two? Am I missing something? Doesn't seem too complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KyleAMathews Yep, and then make sure we start and end the worker pool at the right place. More than happy to code this up for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, let's do that — this will make things more efficient + it'll make it really easy to add the worker pool to other tasks as they come up as we'll have the basic infra in place + since the workers are already spun up, sending task to them is really cheap.
@Moocar lemme know about my question but otherwise I'll merge this in later. |
packages/gatsby/cache-dir/__tests__/__snapshots__/static-entry.js.snap
Outdated
Show resolved
Hide resolved
868fdd3
to
ed4a42a
Compare
Hmm, not sure about that production-runtime error. Can't reproduce it locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve this! 😄 great work, one small nit about a comment
pagePaths, | ||
webpackCompilationHash | ||
) => { | ||
const workerPool = new Worker(require.resolve(`./page-data-worker`), { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds a lot like a "job" api that handles things to a worker pool.
@Moocar maybe just merge master? Didn't want to use the bot to make sure I don't break anything. Circle ci:
|
* move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool
* move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool
* move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool
* move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool
* move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool
* feat(gatsby): Page data without compilation hash (#13139) **Note: merges to `per-page-manifest`, not master. See #13004 for more info** ## Description This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc. It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim. Compilation-hash will be added in future PRs. ## Related Issues - Sub-PR of #13004 * Websocket manager use page data (#13389) * add utils/page-data.read * read websocket page data from utils/page-data * Loader use page data (#13409) * page-data loader working for production-app * get new loader.getPage stuff working with gatsby develop * fix static-entry.js test * remove loadPageDataSync (will be in next PR) * use array of matchPaths * Deprecate various loader methods * remove console.log * document why fetchPageData needs to check that response is JSON * in offline, use prefetchedResources.push(...resourceUrls) * root.js remove else block * loader.js make* -> create* * loader drop else block * pass correct path/resourceUrls to onPostPrefetch * s/err => null/() => null/ * Extract loadComponent from loadPage * remove pageData from window object * update jest snapshots for static-entry (to use window.pagePath) * add loadPageOr404 * preload 404 page in background * normalize page paths * comment out resource-loading-resilience.js (will fix later) * Remove data json from guess (#13727) * remove data.json from plugin-guess * add test for plugin-guess * Gatsby serve use page data (#13728) * use match-paths.json for gatbsy serve * remove pages.json * move query/pages-writer to bootstrap/requires-writer (#13729) And delete data.json * fix(gatsby): refresh browser if webpack rebuild occurs (#13871) * move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool * page-data.json cleanup PR. Remove jsonName and dataPath (#14167) * remove json-name and don't save /static/d/ page query results * in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__ * loader getPage -> loadPageSync (#14264) * Page data loading resilience (#14286) * fetchPageHtml if page resources aren't found * add page-data to production-runtime/resource-loading-resilience test Also use cypress tasks for blocking resources instead of npm run chunks * fetchPageHtml -> doesPageHtmlExist * remove loadPageOr404Sync * revert plugin-offline to master (#14385) * Misc per-page-manifest fixes (#14413) * use path.join for static-entry page-data path * add pathContext back to static-entry props (fix regression) * Remove jsonDataPaths from pre-existing redux state (#14501) * Add context to sentry xhr errors (#14577) * Add extra context to Sentry xhr errors * Don't define Sentry immediately as it'll always be undefined then * mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge) * remove sentry reporting
* feat(gatsby): Page data without compilation hash (gatsbyjs#13139) **Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info** ## Description This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc. It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim. Compilation-hash will be added in future PRs. ## Related Issues - Sub-PR of gatsbyjs#13004 * Websocket manager use page data (gatsbyjs#13389) * add utils/page-data.read * read websocket page data from utils/page-data * Loader use page data (gatsbyjs#13409) * page-data loader working for production-app * get new loader.getPage stuff working with gatsby develop * fix static-entry.js test * remove loadPageDataSync (will be in next PR) * use array of matchPaths * Deprecate various loader methods * remove console.log * document why fetchPageData needs to check that response is JSON * in offline, use prefetchedResources.push(...resourceUrls) * root.js remove else block * loader.js make* -> create* * loader drop else block * pass correct path/resourceUrls to onPostPrefetch * s/err => null/() => null/ * Extract loadComponent from loadPage * remove pageData from window object * update jest snapshots for static-entry (to use window.pagePath) * add loadPageOr404 * preload 404 page in background * normalize page paths * comment out resource-loading-resilience.js (will fix later) * Remove data json from guess (gatsbyjs#13727) * remove data.json from plugin-guess * add test for plugin-guess * Gatsby serve use page data (gatsbyjs#13728) * use match-paths.json for gatbsy serve * remove pages.json * move query/pages-writer to bootstrap/requires-writer (gatsbyjs#13729) And delete data.json * fix(gatsby): refresh browser if webpack rebuild occurs (gatsbyjs#13871) * move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool * page-data.json cleanup PR. Remove jsonName and dataPath (gatsbyjs#14167) * remove json-name and don't save /static/d/ page query results * in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__ * loader getPage -> loadPageSync (gatsbyjs#14264) * Page data loading resilience (gatsbyjs#14286) * fetchPageHtml if page resources aren't found * add page-data to production-runtime/resource-loading-resilience test Also use cypress tasks for blocking resources instead of npm run chunks * fetchPageHtml -> doesPageHtmlExist * remove loadPageOr404Sync * revert plugin-offline to master (gatsbyjs#14385) * Misc per-page-manifest fixes (gatsbyjs#14413) * use path.join for static-entry page-data path * add pathContext back to static-entry props (fix regression) * Remove jsonDataPaths from pre-existing redux state (gatsbyjs#14501) * Add context to sentry xhr errors (gatsbyjs#14577) * Add extra context to Sentry xhr errors * Don't define Sentry immediately as it'll always be undefined then * mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge) * remove sentry reporting
* feat(gatsby): Page data without compilation hash (gatsbyjs#13139) **Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info** ## Description This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc. It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim. Compilation-hash will be added in future PRs. ## Related Issues - Sub-PR of gatsbyjs#13004 * Websocket manager use page data (gatsbyjs#13389) * add utils/page-data.read * read websocket page data from utils/page-data * Loader use page data (gatsbyjs#13409) * page-data loader working for production-app * get new loader.getPage stuff working with gatsby develop * fix static-entry.js test * remove loadPageDataSync (will be in next PR) * use array of matchPaths * Deprecate various loader methods * remove console.log * document why fetchPageData needs to check that response is JSON * in offline, use prefetchedResources.push(...resourceUrls) * root.js remove else block * loader.js make* -> create* * loader drop else block * pass correct path/resourceUrls to onPostPrefetch * s/err => null/() => null/ * Extract loadComponent from loadPage * remove pageData from window object * update jest snapshots for static-entry (to use window.pagePath) * add loadPageOr404 * preload 404 page in background * normalize page paths * comment out resource-loading-resilience.js (will fix later) * Remove data json from guess (gatsbyjs#13727) * remove data.json from plugin-guess * add test for plugin-guess * Gatsby serve use page data (gatsbyjs#13728) * use match-paths.json for gatbsy serve * remove pages.json * move query/pages-writer to bootstrap/requires-writer (gatsbyjs#13729) And delete data.json * fix(gatsby): refresh browser if webpack rebuild occurs (gatsbyjs#13871) * move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool * page-data.json cleanup PR. Remove jsonName and dataPath (gatsbyjs#14167) * remove json-name and don't save /static/d/ page query results * in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__ * loader getPage -> loadPageSync (gatsbyjs#14264) * Page data loading resilience (gatsbyjs#14286) * fetchPageHtml if page resources aren't found * add page-data to production-runtime/resource-loading-resilience test Also use cypress tasks for blocking resources instead of npm run chunks * fetchPageHtml -> doesPageHtmlExist * remove loadPageOr404Sync * revert plugin-offline to master (gatsbyjs#14385) * Misc per-page-manifest fixes (gatsbyjs#14413) * use path.join for static-entry page-data path * add pathContext back to static-entry props (fix regression) * Remove jsonDataPaths from pre-existing redux state (gatsbyjs#14501) * Add context to sentry xhr errors (gatsbyjs#14577) * Add extra context to Sentry xhr errors * Don't define Sentry immediately as it'll always be undefined then * mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge) * remove sentry reporting
Note: merges to
per-page-manifest
, not master. See #13004 for more infoDescription
The is the final new piece of functionality in #13004. It covers the case where a user is browsing the site, and a rebuild occurs in the background. In the old
data.json
world, the user would never see the new changes until they manually refreshed, since everything is statically linked. But in the de-globalized approach, the user might click a link after a rebuild occurs and get a page-data.json that is incompatible with the browser's component code.To avoid this, we save the webpack
stats.hash
which is a hash of the entire webpack build. So whenever any source code in the site changes, there will be a new hash. This is then added to every page-data and html file, so that when the browser loads a new page, it can compare the html's hash to the page's hash, and force a refresh if they're different.Since the page-datas now rely on the webpack compilation hash, I've moved query running out of bootstrap and into
build.js
so that the page querys can be run after the javascript bundle has been built.Related Issues