-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per page manifest #14359
Per page manifest #14359
Conversation
RE the 2 test failures:
|
I want to preface this that I don't know much about this issue and I stumbled here from googling around how to fix the performance of my 3000+ page gatsby project that produces a 200+kb pages-manifest.js file. Everything on my application feels laggy due to the sheer size of the pages-manifest.js file. However, I tried to install your temp fix, the More specifically I am using the packages It seems to be failing at the pagination creation.
here is my gatsby-node:
here is my posttemplate of where the error is occuring:
I am honestly not even sure why or how this is happening, but I thought maybe if there is a chance for my issue to help you then it would be worth for me to share with you. If i revert the gatsby version to one of the regular versions everything builds correctly. If there is something I can do more to help you feel free to ask. |
@ArthurHwang You found a bug! Thanks so much for the report. I'm working on a fix now. It's actually unrelated to Windows. Also, just FYI, |
@Moocar nicely done! I think others have already tested on www (gatsbyjs.org) but wanted to try it out for myself! As such, I deployed it to Netlify here -> https://gatsby-per-page-manifest.netlify.com/ (note: some of the screenshots on showcase and probably starters are broken; I presume unrelated to this PR and some network issues on my machine) It's fun to see the improvement with e.g. webpagetest because we're no longer loading a super large page manifest. This PR Webpagetest vs. latest Gatsbyjs.org Some important call outs:
Super excited to see this land! @pieh @freiksenet maybe we run the build runner tool (if we haven't already!) to identify any issues with sites "in the wild." |
@DSchau Thanks for trying it out! I've already run develop-runner (which I presume you're referring to) and didn't find any issues, although that only covers the build step. @KyleAMathews and I had a chat yesterday and decided that the next step should be deploying gatsby |
@DSchau very cool! Ran some more analysis on webpagetest using their comparison tool https://www.webpagetest.org/video/compare.php?tests=190530_AB_69f522b65151feb6c39a369a50ff7a14%2C190530_8Q_07506324ce40c1917f0d1b37b6c5bdfc&thumbSize=200&ival=100&end=visual The new version starts rendering slightly earlier due I'm guessing to less resource contention from downloading less JS. It's even more dramatic looking at the network graph as the current version is taking a lot longer to download everything due again to having more JS to download. One maybe bug is that this PR is requesting the same resource multiple times which results in 304s — is that by design? |
Here's standalone versions as well for the simple 3g test config new: https://www.webpagetest.org/result/190530_1R_3b04582c9a3d6632b3eb1b8b745a2269/2/details/#waterfall_view_step1 |
It seems like with page-specific data we could leverage Does this sound like the right train of thought and if so is there any plan to expose page data modification capabilities to plugins? |
@ChristopherBiscardi Could you elaborate some more? I don't have enough MDX knowledge to answer this. But here are some comments that may or may not be applicable:
pageData does include the results of graphql queries, but those query results must be pure data, since they have to be serialized to json and sent to the browser. I guess components could be included in graphql query results, but they'd have to be in a pure data AST form that could be loaded by the gatsby app in the browser.
It's not something I've ever considered. At the moment, |
@ChristopherBiscardi adding support for prefetching other page resources is unrelated to this PR — we'd need a new action where plugins could add additional resources to be prefetched by Gatsby e.g. as discussed here: gatsbyjs/rfcs#35 |
@Moocar it's feeling like this is ready to go as the only issues we've seen are long-standing issues unrelated to this PR (e.g. not handling It's been running like a champ on gatsbyjs.org and other site that have tested it so I suggest we get this in and then work on the other issues in other PRs. |
@KyleAMathews Great! I have some follow up code to address the status 0 stuff. But it will need its own thorough review. I just noticed some conflicts with master, I'll create a PR to address now. |
**Note: merges to `per-page-manifest`, not master. See #13004 for more info** ## Description This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc. It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim. Compilation-hash will be added in future PRs. ## Related Issues - Sub-PR of #13004
* add utils/page-data.read * read websocket page data from utils/page-data
* page-data loader working for production-app * get new loader.getPage stuff working with gatsby develop * fix static-entry.js test * remove loadPageDataSync (will be in next PR) * use array of matchPaths * Deprecate various loader methods * remove console.log * document why fetchPageData needs to check that response is JSON * in offline, use prefetchedResources.push(...resourceUrls) * root.js remove else block * loader.js make* -> create* * loader drop else block * pass correct path/resourceUrls to onPostPrefetch * s/err => null/() => null/ * Extract loadComponent from loadPage * remove pageData from window object * update jest snapshots for static-entry (to use window.pagePath) * add loadPageOr404 * preload 404 page in background * normalize page paths * comment out resource-loading-resilience.js (will fix later)
* remove data.json from plugin-guess * add test for plugin-guess
* use match-paths.json for gatbsy serve * remove pages.json
And delete data.json
* move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool
* remove json-name and don't save /static/d/ page query results * in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__
Ah, ok. Thanks v much for the explanation. Yeah, we do have CloudFront in front of s3 - but we’re not invalidating the cache when we push changes. We just sync the files to s3 and let CloudFront respect the cache headers we set on the files. So we don’t set a Minimum TTL on it - but perhaps we should (as outlined in https://github.com/jariz/gatsby-plugin-s3/blob/master/recipes/with-cloudfront.md). |
Thank-you @KyleAMathews , for the explanation, and for all the hard work you and the rest in the community have been putting into it. Unfortunately, I can't stop feeling that the Cloudfront caching strategy from the documentation looks a bit like a hack. Invalidating Cloudfront (or any CDN for the matter) has a cost and carries a propagation delay. Doing so on every build doesn't feel like a good engineering practice. Even less so when your software development practices allow you to push to production more than 50 times a day. We've been following and waiting for this fix for a long time, eager to get it working on our site, but the caching solution for the So far, we've been caching the Is there a possibility to add a hash suffix to the page-data.json? Or some internal mechanisms in Gatsby (naming conventions?) would prevent this from happening? At least this way, people who use Gatsby have the option to decide how to handle their caching strategy. Thanks again for all the hard and good work. |
We won't be able to use hashes inside the filename but we're thinking of adding a querystring parameters to these files. Would this help your use case? |
Thanks @wardpeet for the prompt reply. I would need to run some tests before answering that, try a few edge cases that come to the top of my mind. Although, why not to add the webpack hash as a suffix of the file name? Is not the perfect solution as you still get a cache miss on each new build, but at least it's possible to cache the file forever, without having to worry about hammering the origin with requests, and benefiting from a good cache policy. References in old .html files will still point to files that exist at the origin and most likely at the CDN too. Would that work? |
Agree with @leonfs - adding the webpack-hash to the filename would be perfect. Better than the query string method as then multiple copies of the file could be statically served during an upgrade process. Would be a great work around for this issue if the md5 hash is tricky. |
Adding the webpack-hash doesn't help actually as you can update a page's data without re-running webpack. Is it really that expensive for cloudfront to revalidate files against an s3 bucket? Our expectation is that revalidating a file whether it's changed or not is cheap. E.g. with fastly you can mark a single file as invalidated https://docs.fastly.com/guides/purging/single-purges |
(this is me not knowing very much about cloudfront) |
CloudFront also allows you to purge a single file. This can be achieved in various ways, including on-demand and CI/CD style. |
The issue here is timing. If it is important that the html and the json files are distributed in a lock-step fashion, clearing caches does not provide that. The distributed nature of CDNs by design means that a user could have an older (or newer) version of html than the distributed json. The only way to guarantee consistency is some form of versioning of the json file, either by hash or otherwise. |
@KyleAMathews - If the webpack-hash doesn't work, wouldn't a build hash (unique per build) work? |
This change also broke my S3 + CloudFront setup, resulting in the browser re-loading the page I'm using Reverted to 2.8 for now. |
* feat(gatsby): Page data without compilation hash (gatsbyjs#13139) **Note: merges to `per-page-manifest`, not master. See gatsbyjs#13004 for more info** ## Description This PR saves the `page-data.json` during query running. In order to minimize the size of PRs, my strategy is to save the page-data.json along with the normal query results, and then gradually shift functionality over to use `page-data.json` instead of `data.json`. Once all those PRs are merged, we'll be able to go back and delete the static query results, jsonNames, dataPaths, data.json etc. It does mean that we'll be storing double the amount of query results on disk. Hopefully that's ok in the interim. Compilation-hash will be added in future PRs. ## Related Issues - Sub-PR of gatsbyjs#13004 * Websocket manager use page data (gatsbyjs#13389) * add utils/page-data.read * read websocket page data from utils/page-data * Loader use page data (gatsbyjs#13409) * page-data loader working for production-app * get new loader.getPage stuff working with gatsby develop * fix static-entry.js test * remove loadPageDataSync (will be in next PR) * use array of matchPaths * Deprecate various loader methods * remove console.log * document why fetchPageData needs to check that response is JSON * in offline, use prefetchedResources.push(...resourceUrls) * root.js remove else block * loader.js make* -> create* * loader drop else block * pass correct path/resourceUrls to onPostPrefetch * s/err => null/() => null/ * Extract loadComponent from loadPage * remove pageData from window object * update jest snapshots for static-entry (to use window.pagePath) * add loadPageOr404 * preload 404 page in background * normalize page paths * comment out resource-loading-resilience.js (will fix later) * Remove data json from guess (gatsbyjs#13727) * remove data.json from plugin-guess * add test for plugin-guess * Gatsby serve use page data (gatsbyjs#13728) * use match-paths.json for gatbsy serve * remove pages.json * move query/pages-writer to bootstrap/requires-writer (gatsbyjs#13729) And delete data.json * fix(gatsby): refresh browser if webpack rebuild occurs (gatsbyjs#13871) * move query running into build and develop * save build compilation hash * write compilationHash to page data * reload browser if rebuild occurs in background * add test to ensure that browser is reloaded if rebuild occurs * update page-datas when compilation hash changes * use worker pool to udpate page data compilation hash * update tests snapshot * reset plugin offline whitelist if compilation hash changes * prettier: remove global Cypress * separate page for testing compilation-hash * fix case typo * mock out static entry test webpackCompilationHash field * consolidate jest-worker calls into a central worker pool * page-data.json cleanup PR. Remove jsonName and dataPath (gatsbyjs#14167) * remove json-name and don't save /static/d/ page query results * in loader.cleanAndFindPath, use __BASE_PATH__. Not __PATH_PREFIX__ * loader getPage -> loadPageSync (gatsbyjs#14264) * Page data loading resilience (gatsbyjs#14286) * fetchPageHtml if page resources aren't found * add page-data to production-runtime/resource-loading-resilience test Also use cypress tasks for blocking resources instead of npm run chunks * fetchPageHtml -> doesPageHtmlExist * remove loadPageOr404Sync * revert plugin-offline to master (gatsbyjs#14385) * Misc per-page-manifest fixes (gatsbyjs#14413) * use path.join for static-entry page-data path * add pathContext back to static-entry props (fix regression) * Remove jsonDataPaths from pre-existing redux state (gatsbyjs#14501) * Add context to sentry xhr errors (gatsbyjs#14577) * Add extra context to Sentry xhr errors * Don't define Sentry immediately as it'll always be undefined then * mv cpu-core-count.js test to utils/worker/__tests__ (fix bad merge) * remove sentry reporting
This PR also "broke" a workaround to use gatsby as a static page generator instead of a static site generator: #4337 (comment) By broke I mean that the workaround is no longer working, because before we could change |
@Moocar I have an example for We have a couple hundred pages translated to 20+ languages. The translations are managed in separate web app and are consumed by our gatsby app as
If we could somehow hook into the |
The work for this PR was being tracked at #13004. But now that it's ready to be merged, I've opened this PR
For months, I've been merging work into the
per-page-manifest
branch, which changes Gatsby so that it has a manifest/query-result file per page, instead of a global list of all pages. see #13004 for details.This PR finally merges that branch into master. All work on the
per-page-manifest
branch has already been reviewed, however one final look through wouldn't hurt.