-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is Gatsby designed to support large (10,000+) article wikis? #175
Comments
It is intended to work with very large sites of 10k+ articles but as you're perhaps the first to try creating a site this large, you're also the first to run into not-yet-resolved scaling problems :) Could you describe in detail the problems you're hitting? |
Awesome - I'm glad to hear this. I'm going to ask @BerkeleyTrue and @SaintPeter to jump in on this since they are more familiar with the issue. |
The issue we're seeing occurs when we run the Here is the output with
It returns to my shell prompt after this. |
subscribing 👍 |
Quick questions — are any html pages being generated? If there are, are all? Also the Gatsby build steps are 1) build the static html pages and then 2) build the bundle.js/styles.css. It looks like your build is failing before it gets to building the bundle.js |
Also to investigate why the error isn't being displayed. |
No, no html pages are being generated at all, neither is NOTE: The webpack build time with ~600 files is about 25 seconds. |
This is odd... perhaps this is a memory issue? I haven't looked at how much memory Gatsby uses — perhaps check that? |
Also with the full ~600 files, does it always quit at the same point? |
The thing is that with the prior revision of Gatsby (before the changes to enable react-router 2.0), we were able to generate the site with no problem. I'll see if I can find a way to determine if there are memory issues. I guess I would not expect webpack (or node) to fail silently if there was an out of memory error. Also, the compiled Yes, the process always quits at the same point - at least from the perspective of the debug output. I added debug statements to a local fork of Gatsby at the webpack.run() callback function and they were never reached. |
Yeah, an OOM error doesn't seem that likely. Yeah, it seems the thing to do now is to narrow down exactly where the failure is happening. |
I just created a test site with ~900 pages and it built just fine. This suggests it's not a size problem but there's something in your code that's throwing an error. It is absolutely a problem that the error is being swallowed somewhere. |
Also note on performance. Running |
Watch the memory usage of the process as you're trying to generate for many, many input files. Node will just fall over and die if memory usage gets too high: http://stackoverflow.com/questions/7193959/memory-limit-in-node-js-and-chrome-v8 More generally, I think we may need an option to turn off bundle.js generation, because, for sure, you don't want a bundle.js that includes 10k articles. Back of the envelope math gets us to multiple megabytes very quickly. My 34-post blog has a 218kb gzipped bundle.js. |
@scottnonnenberg yeah, for a very large site you'll want to split the bundle.js into smaller chunks #10 The ability to just turn it off would be a decent idea as well for simple blogs as most incoming traffic just hit one page and then leave so loading other pages wouldn't help much. Per-page bundles could be useful as well. Perhaps you could load the js bundle for the page you're on as well as always loading the home page bundle since that's what someone is most likely to visit from a landing page. |
@KyleAMathews After some debuging we've gotten pretty close to where things fail and it seems to fail deep in webpack. It gets to this point, https://github.com/webpack/webpack/blob/72e8dd01475841b5123607efd8b3c7b892c1a32a/lib/Compiler.js#L255, but the callback is never called. |
So weird! DEEP IN THE BOWELS OF WEBPACK --> this would be a terrifying + dark scene for a frontend dev movie :-D Perhaps try a debugger from that point? https://nodejs.org/api/debugger.html |
Alright, following the rabbit hole even further has ended here https://github.com/markdalgleish/static-site-generator-webpack-plugin/blob/master/index.js#L69. The callback passed into the nodeify method is never called. |
|
@BerkeleyTrue ok I'd try next adding a and then Perhaps one of the render functions is causing troubles. |
I also tried Gatsby with 5000 markdown articles to compare it to Hugo 5000 Posts in 7 seconds, and ended up with the same out of memory error. If you wanna try it out, I have just created a repo that show's the error: https://github.com/jkuetemeier/gatsby-5000 |
@jkuetemeier nice test! I'll play around with this later. You can raise the memory limit for Node pretty easily but I'd need to do some research on how to enable that in a command-line app like Gatsby. If this interests you Jörg this would be a great PR. Expose something like Also to be clear, it doesn't seem like the original issue has anything to do with memory as otherwise they'd get they'd see the error you're seeing. But to support 10,000 post sites, we'll definitely need the ability to raise the memory limit. Also for those interested in this issue, check out #151 |
I tried it. It seems like the individual promise have no isssue, but the Promise.all seems to swallow some error. Does this indicate some issue with out temple/markdown files?
Have you considered using git to reduce the number of items Gatsby needs to build. Gatsby could check the git diff and rebuild just those files that have changed. This would reduce make memory only an issue on template changes. |
@BerkeleyTrue incremental production builds would be very cool. The problem though isn't rebuilding the underlying content but Did you add the |
@BerkeleyTrue added an issue for incremental production builds @ #179 |
Yup, it was a no go. The |
Incremental builds may be very cool, but sometime we'll need to rebuild the whole site at once... e.g. when we host our apps on Heroku etc. |
FYI, RE: Memory usage - |
@BerkeleyTrue @SaintPeter this sounds like it might be a bug in the webpack static site generator then. If you post an issue there, tag me on it so I can follow along. |
@jkuetemeier absolutely. Incremental builds would be an optimization not a requirement. @jquense linked to this interesting issue in the Webpack repo about caching intermediate states in Webpack which would solve the problem nicely. #179 (comment) |
@jkuetemeier I got your 5000 page site to build after a bit of work. I had to first increase the memory limit in That solved the OOM problem. But I then ran into this error: The build took 8:18 and turned out a bundle.js of 12m or 2.2mb after gzipping. So 5000+ pages is definitely doable but there's lots of optimizations that could be done. |
Much of the build time I'll note is spent in building the bundle.js. Generating the HTML pages is pretty quick. |
Great work. |
@KyleAMathews Why is so much time spent building the |
@SaintPeter Gatsby pulls all the metadata for each page into the bundle.js as well so the frontmatter + html of the body of the page — this is why clicking around a Gatsby site is so fast because all information necessary to render other pages is already loaded into the browser. But in any case, building the bundle.js for production is mostly slow because of the minification step w/ uglify.js. |
@hallaathrad was this from the browser console? |
@KyleAMathews yes. using FirefoxDeveloperEditon 46.0a2 @ OSX 1011.4 (15E56a). |
@hallaathrad thanks. Yeah probably doesn't then have something to do with this issue. |
@BerkeleyTrue @SaintPeter what's your status? |
Just ran into this problem. React-Router 2 gets very slow as you add more child routes remix-run/react-router#3215 This is causing problems even for my little ~98 post blog. |
I'm afraid I haven't had the bandwidth to investigate further. @SaintPeter Where you able to make progress with the bundle creation disabled? |
@KyleAMathews Looks like the issue stemmed from an error in our template file. Looks like something unrelated was swallowing the error, though. I was able to finally get an error printout by delete large chunks of files and trying a build. Looks like it was unrelated to the size of the our project, so I think this issue can be closed. |
So what was swallowing the error was in your app? Ok glad you found it! |
@KyleAMathews I don't think you addressed the original question which was if Gatsby is designed to handle a large number of pages. I know @QuincyLarson mentioned 10,000 articles, but what of a million pages? that is possible with a large online magazine. Let me know your thoughts |
That's the direction we're headed. It doesn't yet support a million page site but it will. |
Awesome, thanks for clarifying that. |
@KyleAMathews do you know have more information on this? I.e. you wrote It doesn't yet support a million page site but it will. and we are very interested in this feature. |
If you have such high amount of pages, perhaps Hugo may be the right tool. We have moved our site completely to Hugo. |
As far as I've read in an AMA incremental builds are in the planning and thus large online magazines wouldn't have to rebuild a million articles anymore |
@kuetemeier I still want to use Wordpress as the CMS but Gatsby as the client. How does Hugo handle this approach? |
@parkerproject as always... it depends on what you want to achieve. What exactly do you mean by the term "client". Hugo is a "server" or a static site generator. I guess you want it to use the WordPress database as source. So you have to write some kind of script, that translates WordPress pages and posts into Markdown files (e.g. directly from the database or with a scripted web client). But why would you do that (what is your personal need)? You could use a solid caching solution for WordPress and achieve nearly the same result if you want to keep WordPress in your solution chain. Perhaps you should take a look at Forestry if you are searching for a CMS -> Static Site Generator solution. |
@kuetemeier I like to use Wordpress as a headless CMS and use any front-end I want, in this case, Gatsby helps with that, cos it makes this seamless especially with the GraphQL data layer. But with Hugo, I would need to use Markdown, so that might not work in my use-case |
Is Gatsby intended to allow wikis of 10,000s of articles? Is this a use case you intend to support?
Free Code Camp is having a bit of trouble getting the latest version to work with our current wiki of ~600 articles.
All the example wikis linked to in Gatsby's readme seem comparatively small.
And our wiki is growing quite quickly and may be 10k+ articles by the end of the year.
The text was updated successfully, but these errors were encountered: