-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory during build #10485
Comments
I've started seeing this as well. I had to delete "astro check" from the build command. |
I built a site with 10k+ markdown files and have the same error. I use the following command to increase the heap size to 8G |
May be a duplicate of #7241 |
Hello @agamm. Please provide a minimal reproduction using a GitHub repository or StackBlitz. Issues marked with |
That's not the same issue. OP's issue happens after the |
@ematipico here you go: Check the readme on the steps to reproduce (as the markdown files are 700mb+ and I didn't want to commit them directly). |
UPDATE: Hi @agamm, An error message showed after I executed An error occurred: [Error: ENOENT: no such file or directory, open 'src/pages/blog/file1.md'] {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: 'src/pages/blog/file1.md'
} |
@mingjunlu did you manage to reproduce or you need any help? |
The error is successfully reproduced. Thanks! |
I don't understand what the bug is here? Node.js has memory limits, if you build too many pages then it will take up more memory than Node.js has. The solution is what @mj8327 suggests, setting the |
I get that Node.js has its memory limits, which can be a real pain when we're talking about building heaps of pages. It can easily lead to out-of-memory errors. The tip from @mj8327 about using the But I'm here scratching my head, thinking there's got to be a way to scale up for large projects with 100k+ or even 1M+ files without our memory shooting through the roof. My experience with Astro's build source code isn't all that deep, yet I can't help but think about possible tweaks. Like, what if we handled the file list as a stream? This could keep memory use more or less steady, chunking the process into digestible bits, saving them piece by piece to the hard drive, and wiping the memory before tackling the next chunk. Given how much cheaper hard drive space is compared to RAM, this approach might just save us a ton on pricier machines and headaches for those gigantic builds. Am I missing anything substantial? Would love to get your take! |
Alright, I dove into the code to tackle that pesky memory issue, and it looks like the troublemaker pops up right here: Got me thinking, maybe there's a way to tweak the Vite settings to get some streaming and buffering action going, or, if we're dealing with a memory leak, pinpoint what's triggering it. Any ideas? |
The problem is that every file in an Astro project gets turned into a JavaScript module. All of those modules then need to be bundled by Vite (really Rollup). That's what you're seeing happen. Rollup has to have all of the modules in memory at once; there's no way to buffer them. Our project (really @bluwy) has contributed improvements to Rollup's memory management, but there's limits. The problem you're seeing is architectural. |
@matthewp Even after boosting my Node's memory limit to 20GB, it still crashes with just 100k markdown files. |
@agamm you're making some really solid points here, I'm really learning a lot from your suggestions. Thanks. I trust your contributions can trigger some additions/improvements from team Astro that makes the project even more robust. I mean, imagine building 10m+ pages without any out-of-memory errors or slow downs, damn... not that I personally need that, but who knows, that'd be freaking amazing! |
Thanks! I hope it really helps and eventually gets there. |
Hey y'all. It basically copies md files from an input directory, in chunks, builds, saves them to the side, then removes the old files and copies new files, builds again → at the end it merges all the build outputs into dist. Do you have an idea of an easier / more astro-natural way of doing this? @matthewp @bluwy (I don't mind creating a PR if you could give me some starting points / guiding on how you'd do it). |
@agamm No, that idea doesn't work, because we can't assume you don't need all of these files at the same time. The content collections API does not restrict how many content files you can use at the same time, you could very realistically do |
@matthewp |
There's really no options I can think of with using that much MDX. The content collections cache feature might help, but only un subsequent builds. |
@matthewp Can I use something other than MDX? And what if I don't link between pages? |
I took a look at this and found some optimizations we can do (will send a PR later), but it still won't solve this as the build "effort" will still scale proportionately the number of files. To support features from markdown, like image optimizations, frontmatter access from JS, etc, it needs to participate in the Rollup build. And at the bare minimum, Rollup needs to keep all these files in-memory to enable advanced processing, and output an optimized build. When the amount of files increase, you need to use Node's I don't think there's much we can do as its at the limit of JS tooling. Maybe a Rust-based toolchain from Vite (Rolldown), may enable higher volume processing in the future. You could separate out your files and build part by part, but there might be duplicated code when stitched together. |
@bluwy, thanks! What do we lose if we don't use the "advanced processing"? All of my site is static using a very basic template.
Also can you explain step by step the last solution you came up with? I'm down to try it. |
Like mentioned above, image optimizations, etc. If you have layouts for your markdown, the Rollup build can also optimize the bundled code, generate a more efficient CSS output, etc. If your site contains only plain markdown, and you don't plan to use MDX or islands or Astro components, personally I'd setup a manual build chain that compiles each Markdown file to HTML directly and keep it really simple. Each strategies has its tradeoffs.
I mean the solution you shared at #10485 (comment). Which could work for simple cases but may have caveats down the road. |
Hi, I’ve just encountered this error using data collections with JSON files. It doesn’t matter if they are many small JSON files or a few large ones (artificially oversized for the test), it still gives an error. I’m migrating a site from Gatsby 4.0 to Astro and I don’t think I’ll be able to :S (40K+ pages). In Gatsby, I had a similar memory issue with createPages, and what I ended up doing was creating a generator function * that would yield the content of the JSONs one by one instead of loading them all into memory. It’s a bit disappointing that I can’t migrate my site to Astro which I love! :S I’m trying to split the process, creating different |
Yeah we still have the problem too and are considering staying with Gatsby as well (after months of working on Astro the migration will be painful, but probably necessary). @bluwy is there any creative solution here? |
Yep, @bluwy we're stick with this too, after leaving Gatsby for Astro. Any chance to get some sort of workaround? |
@TomGranot @agamm it's not a production-ready solution right now, but it would be very good to know if content layer would fix it for you. It doesn't support MDX, but for markdown it uses a lot less memory than the current approach. There are details on how to try it here: #11360 |
MDX is one of the main reasons we moved to Astro so it's a bit of a deal breaker at least for us. |
@ascorbic I'm on the same front as agamm, content layer is a no no since we heavily rely on JSX for dynamic content |
MDX is planned, it's just not implemented in the experimental release as it's a lot more complex |
I've used a method to get around the memory issue, which is to build the entire website part by part and then use a script to merge the results. Basically a simple copy, but remember to deal with sitemap and pagefind. This method works very well so far. Each time a part of the document is built, other documents are temporarily marked as starting with "_" so that they can be ignored during the build. |
Can you share your code? (this is what I came up with last time) |
Astro Info
If this issue only occurs in one browser, which browser is a problem?
No response
Describe the Bug
I have a website with > 100K markdown files I need statically built.
When I played with testing the limits of astro I got:
What's the expected result?
No memory error. Probably loading md files in a buffer one by one and making sure there isn't anything growing in memory for each file. At a minimum give out a warning or error of the limits.
To create the files:
Link to Minimal Reproducible Example
It crashed when I tried recreating 500k markdown files.
Participation
The text was updated successfully, but these errors were encountered: