-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are page-data.json files not content hashed? #15080
Comments
@mikheevm The Gatsby Team describes in this blog post why they are not hashing the page-data.json files anymore. I am also running into some issues because of this. Is there a way to enable the hashing of the page-data.json files again? |
If the new version .md file is incompatible to previous version (i.e. frontmatter schema) Which lead to:
Gatsby should have a config to allow user toggle page-data.json hash feature for cache friendly environment. Until someday the website development is stable(only content update, no React code changes), then user can switch the feature off for the performance gain. Currently, I have to downgrade to 2.8.8 for rapid agile development |
@gaplo917 how do you know you're seeing this error? We include a build hash in each page-data.json so if a client loads a json file and sees it came from a newer build, it'll force a full refresh to get the new component code. If that's not working that's a bug we need to fix. More details about what's happening and reproduction instructions would be great! |
Also in general, you should only be caching files that are in the static folder and .js files as described in the caching docs https://www.gatsbyjs.org/docs/caching/#static-files The new json files shouldn't be cached. If you're running into troubles, check out your cache-control settings. |
Thanks for your advice! I have assumed that everything can be cached. Because gatsby 2.8.8 will generate hash for all page-data.json. This guarantee a new/old html will load new/old/same json Will try your advice and see the result. |
Thank-you everybody for bringing this problem into notice. It has also been debated in the original PR thread (#14359), with some great contributions on how the manual cache-invalidation technique is not really a great solution. Our team has been working on a small script (that runs on Gatsby's If anyone is interested, we could share our approach. |
@leonfs Would be really great if you share it :) |
@mikheevm something along the lines of.. const fs = require("fs").promises;
const glob = require("glob");
const md5 = require("md5");
const path = require("path");
exports.onPostBuild = async () => {
const publicPath = path.join(__dirname, "public");
const hash = md5("replace-with-your-own-hash");
const jsonFiles = glob.sync(`${publicPath}/page-data/**/page-data.json`);
console.log("[onPostBuild] Renaming the following files:");
for (let file of jsonFiles) {
console.log(file);
const newFilename = file.replace(`page-data.json`, `page-data.${hash}.json`);
await fs.rename(file, newFilename);
}
const htmlAndJSFiles = glob.sync(`${publicPath}/**/*.{html,js}`);
console.log("[onPostBuild] Replacing page-data.json references in the following files:");
for (let file of htmlAndJSFiles) {
const stats = await fs.stat(file, "utf8");
if (!stats.isFile()) continue;
console.log(file);
var content = await fs.readFile(file, "utf8");
var result = content.replace(/page-data.json/g, `page-data.${hash}.json`);
await fs.writeFile(file, result, "utf8");
}
}; Disclaimer! As @leonfs point out, this seems to work for us - but it's obviously a hack to suit our caching implementation. No guarantee it'll work for anyone else. -- |
If anyone else tries this approach, it would be great to add comments on the results gotten. |
I have been hitting this issue with a Gatsby site that is deployed to Github Pages. So I unfortunately have no control over how the server decides to set caching headers on page-data.json. In my case I find after a new deploy, that stale page-data.json data gets used due to it being cached, and you see the page flicker from the new data back to the old once the page-data.json request returns. |
I added the onPostBuild that @harrygreen posted (thanks for that). But I am also finding that the root If I:
So since the file name does not update, my browser uses the old cached version, which loads the wrong page-data.json file. |
btw @harrygreen, I think it should be |
If this works and it's widely used. Feel free to create a plugin that changes this behaviour. We're happy to take new plugins 🎉 https://www.gatsbyjs.org/contributing/submit-to-plugin-library/ |
@city41 Glad it could help you somewhat. We're not using it in production yet because of a different but consistent failure inside As for this:
We do want the inverse of your suggestion though; we only want to rename files, not folders. We want to exit the function early if it's not a file. Is the @wardpeet Thanks for the tip. My code really is a hack against Gatsby, so I'm reluctant to publish anywhere.. but maybe one day. I'd much rather |
@harrygreen using
I agree this shouldn't be a plugin. I honestly feel like this is a bug in Gatsby. |
@city41 good spot 🤜 (it was originally inside a function but I switched to The issue is Gatsby's rule to not cache certain assets, e.g. HTML, and now this JSON - both of which require a lockstep relationship). Now that the JSON hash has been lost, some flexibility over the possible caching strategies has been lost. This comment sums up the issue if cache-clearing is required on a CDN. |
@city41 it's not a gatsby bug, page-data.json shouldn't be cached. It gives us the opportunity to build pages separately on only change what's needed. I do agree that each new version of app-560e4b2f43729239ce7d.js should get a new hash if things changed. So that's definitely a bug. Do you mind opening a new one with this information? see #15080 (comment) |
I think app-sha.js is built by webpack? From its perspective, it hasn't changed. So that one might be trickier. Maybe bug is too strong of a word? Maybe this should be an option the user can opt into? It makes using Gatsby when you don't have access to the server difficult. Gatsby is a great choice when you only have static hosting (like gh-pages), but this caching issue makes it unusable in those scenarios without the If Gatsby fingerprinted page-data.json files, then app-sha.js would probably naturally get a change, and webpack would re-fingerprint it I imagine. |
I agree with @city41 and many of the other folks who have chimed in here. We had a long discussion internally, and our consensus is that without being able to host everything at the edge cache level, using Gatsby as opposed to something like Next.js serves very little purpose. The entire point of rendering at build time is that we don't need to have a server and can instead host the built files directly on the edge. If files like For our purposes, we already were serving index pages from an nginx server and using assetPrefix for various other reasons for the time being, so we just modified @harrygreen's script to work for that use case (see below). Going forward, however, we will need to reevaluate the value of a tool like Gatsby compared to a tool like Next, given that build-time rendering introduces a whole host of issues we need to work around and very few benefits over traditional server side rendering if the files have to be hosted on a server instead of on the edge. Happy to chat more about this and alternative approaches -- we love Gatsby and are very grateful for the amazing work ya'll are doing! Here's the
|
It seems that the amount of people using gatsby in a way that needs hashing (as a much simpler way to manage CDNs and new deployments) it's actually pretty big. Release 2.9 was a huge step forward for Gatsby, increasing performance for big sites. Thanks a lot for that amazing work! But the side effects of that release (impossible to fully disable client side navigation yet, issues with easy deployments to s3+cloudfront) clearly makes 2.9 a non backwards compatible release, with quite a lot of people now locked to 2.8.x due to all those issues. Would it make sense to introduce @jaredsilver solution in core, or via a generic plugin (supported by gatsby project itself)? That would make Gatsby 2.9+ closer to a real 2.8-compatible version. |
I propose introducing the solution by @jaredsilver into the core, as it would allow easy cache invalidation on the edge, and a really streamlined, rapid deployment of the site, without all the waiting on "that cache is still alive" somewhere else. Also, this would allow us to literally cache everything, decreasing the impact to the origin and maximize the use of CDN. |
Hi, I noticed that my gatsbyjs website would show a blank screen and become really unresponsive (approximately 20-50 seconds) on some mobile browsers (Chrome and Opera) after performing a hard refresh. Inspecting the network using Chrome DevTools indicates that the delay comes from On page refresh, the request for It took about 50 seconds for the blank screen to go away and the page to refresh. I've already mentioned this in #11006 (comment). I'm not sure if adding a hash to |
@jaredsilver where do you host your site? Perhaps there's a misunderstanding here? I'm not sure what you mean by "edge host". Any cdn can work with the changes in 2.9 -- you just need to set the |
Perhaps the reason we overlooked this is that most CDNs designed to serve static sites handle this correctly already i.e. they serve a file from the edge until a new build invalidates the file. Netlify/Zeit/Firebase etc. work this way. Most/all CDNs have a purge or invalidation API that can be setup to do this as well. What CDNs are y'all using? Let's put together some docs on setups. This does complicate setup I agree and can be error prone. If you haven't had a chance to read about why we needed to remove the page manifest (and hence the hashed data files) please read the blog post https://www.gatsbyjs.org/blog/2019-06-12-performance-improvements-for-large-sites/ Happy to dive more into the rational behind it. |
I was in the middle of writing out a long response detailing why you're wrong when I realized that I am wrong 😄 If we use a combination of I have opened a PR to update the caching page of the docs with this information so hopefully other folks won't run into this issue in the future. Feel free to check it out here: #16368 Thanks, @KyleAMathews! And if anyone else in the thread has any questions/concerns, I'd be happy to explore this further. |
Hey @jaredsilver, mind if take a look at Cloudflare? We're using a mix of Cloudfront and Cloudflare to ensure the cached b/w and security, tho it seems despite the change from Cloudfront, Cloudflare needs to manually clearing the cache IMO. |
@lifehome It looks like Cloudflare does not cache Note: if you're using a service worker, it looks like Cloudflare does cache that by default since it ends in a |
This stale page data makes your website crash if its object structure changed for example. |
Hey folks, we'd love to see reproductions of ways this can cause crashes. If someone could demonstrate exactly how this happens that'd be great e.g. a script which swaps in files in a certain order and causes the frontend to crash. There's a lot of tests to ensure the frontend is robust to different scenarios and so we'd love to see what we're missing. |
@KyleAMathews @antoinerousseau The most frustration part is that the site using GatsbyJs v2.8.8 (
I have mentioned in this thread before:
I would say that "GatsbyJs >= 2.9.0 is not cache friendly". The only way to fix it is to completely disable JSON file cache because As a result, using GatsbyJs >= 2.9.0 would fail on same deployment config that previously worked in a long time (<= 2.8.8). In the video demo, I use a query variable to force get the latest HTML. In fact, this is a normal behaviour, that we cannot control query variable appeared in the browser (just like social platform will add it...). The only thing we can do is sacrificing the caching ability of all potentially useful JSON files just like the settings of Netlify and spent extra effort to test the configuration on GCP Bucket / CloudFront / CloudFlare. Test Repo: https://github.com/gaplo917/gatsby-starter-blog |
Hiya! This issue has gone quiet. Spooky quiet. 👻 We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. Thanks for being a part of the Gatsby community! 💪💜 |
Still a problem.
…On Sat, 25 Apr 2020 at 13:03, github-actions[bot] ***@***.***> wrote:
Hiya!
This issue has gone quiet. Spooky quiet. 👻
We get a lot of issues, so we currently close issues after 30 days of
inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here.
You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other,
fixed is to open a Pull Request. Check out gatsby.dev/contribute
<https://www.gatsbyjs.org/contributing/how-to-contribute/> for more
information about opening PRs, triaging issues, and contributing!
Thanks for being a part of the Gatsby community! 💪💜
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15080 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABHNUM6YXQIASLRPKCP2DTROLGQ7ANCNFSM4H26TQNQ>
.
|
Hey folks, the original question was answered and continuing the discussion about a bug 20 answers later after initial discussion about a question isn't worthwhile to find the relevant information quickly. We'd appreciate a new bug report with a filled out template and a reproduction to track the bug if it's not the same as #19618. Otherwise the bug itself is tracked there already. Thanks! |
@gaplo917 Do you want to create the new ticket? |
This is still an issue and especially when deployed to Github Pages. The default caching for files on github pages is with |
We came across the
|
here is one option that kind of works for me. Just adding a unique query string at the end of add to gatsby-node.js: const path = require(`path`)
const glob = require('glob')
const md5 = require('md5')
const fs = require('fs-extra')
exports.onPostBuild = async () => {
const publicPath = path.join(__dirname, 'public')
const hash = md5(Math.random().toString(36).substring(7))
const htmlAndJSFiles = glob.sync(`${publicPath}/**/*.{html,js}`)
console.log(
'[onPostBuild] Replacing page-data.json references in the following files:'
)
for (let file of htmlAndJSFiles) {
const stats = await fs.stat(file, 'utf8')
if (!stats.isFile()) continue
console.log(file)
var content = await fs.readFile(file, 'utf8')
var result = content
.replace(/page-data.json/g, `page-data.json?${hash}`)
.replace(/app-data.json/g, `app-data.json?${hash}`)
await fs.writeFile(file, result, 'utf8')
}
} This seems to be needed it as I am using Github Pages to host the gatsby site and GHpages has a cache policy However, the documentation on gatsby clearly says https://www.gatsbyjs.com/docs/caching/ that This makes a weird behavior in pages that have been updated but don't seem to change as well as 404 results for some weird reason as Github Pages caches 404 responses as well. YES 404 are cached by Github Pages... so yeah if you try to visit a page |
@city41 Exact the same problem. With #15080 (comment) 's solution the app-sha.js gets touched while gatsby regards it as a valid cache, so that referring to the wrong page-data-sha.json Any solutions? |
Hi gatsby team :) We've just spent few hours on debugging an issue with content not appearing on the website and turned out that it was because of page-data.json cached.
Why does it not use content hash? Should cache be handled another way?
The text was updated successfully, but these errors were encountered: