-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Deterministic loading of data from path #4626
Comments
AFAIK only reason hashes need to be there is exactly for cache busting. I am actually currently working on speeding up build process and am touching this part of the code (but no real change there - map to files is still dumped to single file). I'd be interested to hear ideas how we could handle that so it would scale nicer (but not by removing hashes from files :) ). |
This would be a nice feature as well. I the docs, it looks like maybe it could override the $path variable in the template, instead using the filesystem path |
@pieh Is your work in a branch somewhere that I could use as a starting point to try and dig into this? I fear that this comes from deep in Gatsby's architecture and so could be difficult to change / refactor. I'm definitely willing to dig into it and see what I can figure out. |
@chmac My branch is here https://github.com/pieh/gatsby/tree/json-loader More context about it - together with @m-allanson we are working on speeding up build and develop process by removing bundling and loading json data (results of queries) out of webpack and doing that directly by gatsby. So this actually doesn't focus on reducing app bundle size, but I do small change to If you wish to dive in the code here are some entry points you might want to check (links to current master branch as I don't change too much in this department and my branch is still WIP):
Before doing changes in code we should probably figure how we could design it so it doesn't increase build time too much and will produce more manageable bundles. |
@pieh Awesome, thanks for the tips, that's a huge help. I've spent a few days deep diving into this stuff. Here's what I've understood (please correct me if I've misunderstood any of this stuff, it's a real possibility).
GoalsI'd suggest the following goals:
IdeaHere's one idea about how we could move towards those goals.
The async function would allow us to fetch Very open to any feedback. |
@chmac I was thinking about this a little and for initial load and mounting react components we don't need that map in app bundle or in html - we can delay loading that after initial component is mounted. Not sure how we could approach chunking that map in the next step - how would we know what chunk we need to load to get path to data for given page? |
Yes, loading it later makes sense. That will make it async anyway, which paves the way for fancier stuff. Sharding, I'm thinking back to my WordPress days and database sharding on MySQL. We used to use a remarkably simple scheme that looked something like this: const calculateShardNameForId(shardLength: int, id: string) => md5(id).substr(0, shardLength) Any hashing algo would work, and the only thing we need to know is the A There are probably lots of potential optimisations, but that was the general approach I was thinking about. |
Lazy loading of the paths to page json files is the obvious next step. Sharding would be nice for really large sites. Ideally you'd shard by something like path names so a shard for |
OK, sounds like we're reaching consensus around the plan:
How do we move forward? There's currently work being done on switching from webpack to our own JSON pipeline in #4555 (described somewhat in #3575). Do we fold the lazy loading into one of those tickets? Create a new ticket for the lazy loading idea? The original idea I proposed in this ticket doesn't make sense, we'd break the cache busting / content hashing. |
Lazy loading paths to jsons file names and map that specify what components (pages/layouts) are used for paths is pretty much done - #4715 (I should probably ping here when I posted it) |
It's for v2 and it's based on #4555 |
@pieh Awesome! v2 is looking better and better! In that case, I'll close this issue, and I'll create a new one about sharding |
Just to give more info - when I run my tests against https://github.com/freeCodeCamp/guides (~2800 pages) - gzipped "webpackified" |
@pieh thanks for cracking this! I'm curious if there is a planned/estimated release date for v2? |
@vinniejames There's no date but you can track progress over at https://github.com/gatsbyjs/gatsby/projects/2 |
tl;dr Could we remove the map of
path
to data file inapp-*.js
and instead try to fetch data by converting the linkpath
to a data filename, handling 404s if it doesn't exist, etc?History
I'm experimenting with a Gatsby site that has ~3.5k pages. The bundle sizes are like so:
I haven't fully understood Gatsby's data structure, but checking the network tools shows that
app-*.js
is loaded as soon as the page finishes loading.It seems like the current architecture uses webpack to build a Map of all paths to their
path
to the relevant file on disk. This means that as the number of pages grows, the site's bundle size grows. I presume this approach will not scale very well for sites with 10k or 100k pages.Idea
Would it be possible to deterministically map
path
to data file? Further, if we could do that (which I guess we could), would it be possible to skip the "list of pages" and just fetch data by transforming thepath
variable into its data file?Extra thoughts
The text was updated successfully, but these errors were encountered: