Out of memory during build #10485

agamm · 2024-03-19T07:07:55Z

Astro Info

Astro                    v4.5.6
Node                     v18.18.2
System                   Linux (x64)
Package Manager          unknown (edit: pnpm)
Output                   static
Adapter                  none
Integrations             none

If this issue only occurs in one browser, which browser is a problem?

No response

Describe the Bug

I have a website with > 100K markdown files I need statically built.

When I played with testing the limits of astro I got:

01:49:57 [build] Building static entrypoints...

<--- Last few GCs --->

[10615:0x71cf750]    39539 ms: Scavenge 4002.0 (4127.2) -> 4002.8 (4144.0) MB, 911.3 / 0.0 ms  (average mu = 0.715, current mu = 0.134) allocation failure; 
[10615:0x71cf750]    42292 ms: Mark-sweep 4002.9 (4144.0) -> 4000.1 (4156.7) MB, 2752.1 / 0.0 ms  (average mu = 0.533, current mu = 0.251) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb85bc0 node::Abort() [node]
 2: 0xa94834  [node]
 3: 0xd66d10 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xd670b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xf447c5  [node]
 6: 0xf56cad v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 7: 0xf313ae v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 8: 0xf32777 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xf1394a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
10: 0x12d8caf v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x1705b39  [node]
Aborted
 ELIFECYCLE  Command failed with exit code 134.

What's the expected result?

No memory error. Probably loading md files in a buffer one by one and making sure there isn't anything growing in memory for each file. At a minimum give out a warning or error of the limits.

To create the files:

// first create pages/blog dir.
import { writeFile, mkdir } from "fs/promises";
import { join } from "path";

const createMarkdownFile = async (index) => {
  const frontMatter = `---
title: "Title ${index}"
date: "2024-03-19"
---\n`;
  const content = `# Title ${index}\n\nContent for file ${index}.`;
  const fullPath = join("src", "pages", "blog", `file${index}.md`);

  await writeFile(fullPath, frontMatter + content, "utf8");
};

const createMarkdownFiles = async () => {
  try {
    for (let i = 1; i <= 500000; i++) {
      await createMarkdownFile(i);
    }
    console.log("Markdown files have been created.");
  } catch (error) {
    console.error("An error occurred:", error);
  }
};

createMarkdownFiles();

Link to Minimal Reproducible Example

It crashed when I tried recreating 500k markdown files.

Participation

I am willing to submit a pull request for this issue.

The text was updated successfully, but these errors were encountered:

lucamtudor · 2024-03-19T07:21:43Z

I've started seeing this as well. I had to delete "astro check" from the build command.

mj8327 · 2024-03-19T07:42:37Z

I built a site with 10k+ markdown files and have the same error. I use the following command to increase the heap size to 8G
export NODE_OPTIONS="--max-old-space-size=8192"
and it fails too. then I tried 16G, fails too. There was no error until I set it to 24GB, the peak value is 17GB, stable at 16G. It seem that memory usage has no relation with doc file size, I used files 10 times different in size and got the same test results.
by the way, How long does it take to build a website with 100k+ documents? @agamm

florian-lefebvre · 2024-03-19T08:07:43Z

May be a duplicate of #7241

ematipico · 2024-03-19T08:11:24Z

I think this is different from #7241; that issue is about content collections.

@agamm I would ask you to create a reproduction, even if you know that it will crash. We just need a way to reproduce it in a minimal way.

github-actions · 2024-03-19T08:12:19Z

Hello @agamm. Please provide a minimal reproduction using a GitHub repository or StackBlitz. Issues marked with needs repro will be closed if they have no activity within 3 days.

Princesseuh · 2024-03-19T08:48:44Z

I've started seeing this as well. I had to delete "astro check" from the build command.

That's not the same issue. OP's issue happens after the astro check step would.

agamm · 2024-03-19T17:55:21Z

@ematipico here you go:
https://github.com/agamm/outofmem-astro

Check the readme on the steps to reproduce (as the markdown files are 700mb+ and I didn't want to commit them directly).

mingjunlu · 2024-03-20T03:12:32Z

UPDATE:
I realized that the src/pages/blog folder must be created first 💡

Hi @agamm,

An error message showed after I executed node populate.js (1st step in README)

An error occurred: [Error: ENOENT: no such file or directory, open 'src/pages/blog/file1.md'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'open',
  path: 'src/pages/blog/file1.md'
}

agamm · 2024-03-20T04:18:07Z

@mingjunlu did you manage to reproduce or you need any help?

mingjunlu · 2024-03-20T04:20:19Z

The error is successfully reproduced. Thanks!

matthewp · 2024-03-23T15:22:18Z

I don't understand what the bug is here? Node.js has memory limits, if you build too many pages then it will take up more memory than Node.js has. The solution is what @mj8327 suggests, setting the --max-old-space-size flag. I don't think there's anything Astro can do here, unless I'm missing something.

agamm · 2024-03-23T21:09:20Z

I don't understand what the bug is here? Node.js has memory limits, if you build too many pages then it will take up more memory than Node.js has. The solution is what @mj8327 suggests, setting the --max-old-space-size flag. I don't think there's anything Astro can do here, unless I'm missing something.

I get that Node.js has its memory limits, which can be a real pain when we're talking about building heaps of pages. It can easily lead to out-of-memory errors. The tip from @mj8327 about using the --max-old-space-size flag is definitely a handy fix for now.

But I'm here scratching my head, thinking there's got to be a way to scale up for large projects with 100k+ or even 1M+ files without our memory shooting through the roof. My experience with Astro's build source code isn't all that deep, yet I can't help but think about possible tweaks. Like, what if we handled the file list as a stream? This could keep memory use more or less steady, chunking the process into digestible bits, saving them piece by piece to the hard drive, and wiping the memory before tackling the next chunk.

Given how much cheaper hard drive space is compared to RAM, this approach might just save us a ton on pricier machines and headaches for those gigantic builds. Am I missing anything substantial? Would love to get your take!

agamm · 2024-03-23T21:48:03Z

Alright, I dove into the code to tackle that pesky memory issue, and it looks like the troublemaker pops up right here:
in astro/dist/core/build/static-build.js, around line 219:
return await vite.build(updatedViteBuildConfig);

Got me thinking, maybe there's a way to tweak the Vite settings to get some streaming and buffering action going, or, if we're dealing with a memory leak, pinpoint what's triggering it. Any ideas?

matthewp · 2024-03-23T22:28:55Z

The problem is that every file in an Astro project gets turned into a JavaScript module. All of those modules then need to be bundled by Vite (really Rollup). That's what you're seeing happen. Rollup has to have all of the modules in memory at once; there's no way to buffer them. Our project (really @bluwy) has contributed improvements to Rollup's memory management, but there's limits. The problem you're seeing is architectural.

agamm · 2024-03-23T23:29:20Z

The problem is that every file in an Astro project gets turned into a JavaScript module. All of those modules then need to be bundled by Vite (really Rollup). That's what you're seeing happen. Rollup has to have all of the modules in memory at once; there's no way to buffer them. Our project (really @bluwy) has contributed improvements to Rollup's memory management, but there's limits. The problem you're seeing is architectural.

@matthewp
I'm wondering, is there a workaround to handle just a fraction of the files at a time? Like, could we run the build on only 10% of the markdown files, save the build output somewhere else, and then copy in the next 10% of files for Astro to initiate a new build? This loop would keep going until we've covered all ground.

Even after boosting my Node's memory limit to 20GB, it still crashes with just 100k markdown files.
Eventually I'd like to be able to run it on 1-10M+ files, so upping the memory doesn't seem to work.

k16e-me · 2024-03-24T09:31:59Z

@agamm you're making some really solid points here, I'm really learning a lot from your suggestions. Thanks. I trust your contributions can trigger some additions/improvements from team Astro that makes the project even more robust. I mean, imagine building 10m+ pages without any out-of-memory errors or slow downs, damn... not that I personally need that, but who knows, that'd be freaking amazing!

agamm · 2024-03-24T23:20:49Z

@agamm you're making some really solid points here, I'm really learning a lot from your suggestions. Thanks. I trust your contributions can trigger some additions/improvements from team Astro that makes the project even more robust. I mean, imagine building 10m+ pages without any out-of-memory errors or slow downs, damn... not that I personally need that, but who knows, that'd be freaking amazing!

Thanks! I hope it really helps and eventually gets there.
Many people I know are looking for such static builders + CDNs that can handle that load.

agamm · 2024-03-27T06:05:38Z

Hey y'all.
I have put together a really barbaric workaround for now (doesn't handle the sitemap or complex internal links):
https://gist.github.com/agamm/aaa766dc6173f4523def74cbd5681fa7

It basically copies md files from an input directory, in chunks, builds, saves them to the side, then removes the old files and copies new files, builds again → at the end it merges all the build outputs into dist.

Do you have an idea of an easier / more astro-natural way of doing this? @matthewp @bluwy (I don't mind creating a PR if you could give me some starting points / guiding on how you'd do it).

matthewp · 2024-03-27T14:02:04Z

@agamm No, that idea doesn't work, because we can't assume you don't need all of these files at the same time. The content collections API does not restrict how many content files you can use at the same time, you could very realistically do getCollection('name') and render them all on the same page. So we can't do such a restriction.

agamm · 2024-03-30T05:16:22Z

@agamm No, that idea doesn't work, because we can't assume you don't need all of these files at the same time. The content collections API does not restrict how many content files you can use at the same time, you could very realistically do getCollection('name') and render them all on the same page. So we can't do such a restriction.

@matthewp
I guess it doesn't make sense to work that way by default - so yeah, I agree with you. But having the option to do it with some config flag, could be interesting (while warning any * use of the collections API?).
Increasing the RAM, doesn't work at some point, so what other available options do we have?

matthewp · 2024-04-02T20:29:56Z

There's really no options I can think of with using that much MDX. The content collections cache feature might help, but only un subsequent builds.

agamm · 2024-04-03T03:33:59Z

@matthewp Can I use something other than MDX? And what if I don't link between pages?

bluwy · 2024-04-05T12:29:54Z

I took a look at this and found some optimizations we can do (will send a PR later), but it still won't solve this as the build "effort" will still scale proportionately the number of files.

To support features from markdown, like image optimizations, frontmatter access from JS, etc, it needs to participate in the Rollup build. And at the bare minimum, Rollup needs to keep all these files in-memory to enable advanced processing, and output an optimized build. When the amount of files increase, you need to use Node's max_old_space_size option so that Rollup can continue processing them.

I don't think there's much we can do as its at the limit of JS tooling. Maybe a Rust-based toolchain from Vite (Rolldown), may enable higher volume processing in the future. You could separate out your files and build part by part, but there might be duplicated code when stitched together.

agamm · 2024-04-08T01:48:09Z

pport features from markdown, like image optimizations, frontmatter access from JS, etc, it needs to participate in the Rollup build. And at the bare minimum, Rollup needs to keep all these files in-memo

@bluwy, thanks! What do we lose if we don't use the "advanced processing"? All of my site is static using a very basic template.

You could separate out your files and build part by part, but there might be duplicated code when stitched together.

Also can you explain step by step the last solution you came up with? I'm down to try it.

bluwy · 2024-04-08T05:31:37Z

What do we lose if we don't use the "advanced processing"? All of my site is static using a very basic template.

Like mentioned above, image optimizations, etc. If you have layouts for your markdown, the Rollup build can also optimize the bundled code, generate a more efficient CSS output, etc.

If your site contains only plain markdown, and you don't plan to use MDX or islands or Astro components, personally I'd setup a manual build chain that compiles each Markdown file to HTML directly and keep it really simple. Each strategies has its tradeoffs.

Also can you explain step by step the last solution you came up with? I'm down to try it.

I mean the solution you shared at #10485 (comment). Which could work for simple cases but may have caveats down the road.

FrancisVega · 2024-07-29T09:27:29Z

Hi,

I’ve just encountered this error using data collections with JSON files. It doesn’t matter if they are many small JSON files or a few large ones (artificially oversized for the test), it still gives an error.

I’m migrating a site from Gatsby 4.0 to Astro and I don’t think I’ll be able to :S (40K+ pages).

In Gatsby, I had a similar memory issue with createPages, and what I ended up doing was creating a generator function * that would yield the content of the JSONs one by one instead of loading them all into memory.

It’s a bit disappointing that I can’t migrate my site to Astro which I love! :S

I’m trying to split the process, creating different [...prop].astro files, but everything is loading in memory before the HTMLs are created, which, as you mentioned, seems to be a structural issue with Vite/Rollup.

agamm · 2024-07-29T11:56:49Z

Hi,

I’ve just encountered this error using data collections with JSON files. It doesn’t matter if they are many small JSON files or a few large ones (artificially oversized for the test), it still gives an error.

I’m migrating a site from Gatsby 4.0 to Astro and I don’t think I’ll be able to :S (40K+ pages).

In Gatsby, I had a similar memory issue with createPages, and what I ended up doing was creating a generator function * that would yield the content of the JSONs one by one instead of loading them all into memory.

It’s a bit disappointing that I can’t migrate my site to Astro which I love! :S

I’m trying to split the process, creating different [...prop].astro files, but everything is loading in memory before the HTMLs are created, which, as you mentioned, seems to be a structural issue with Vite/Rollup.

Yeah we still have the problem too and are considering staying with Gatsby as well (after months of working on Astro the migration will be painful, but probably necessary).

@bluwy is there any creative solution here?

TomGranot · 2024-07-29T12:12:46Z

Yep, @bluwy we're stick with this too, after leaving Gatsby for Astro.

Any chance to get some sort of workaround?

ascorbic · 2024-07-29T13:02:11Z

@TomGranot @agamm it's not a production-ready solution right now, but it would be very good to know if content layer would fix it for you. It doesn't support MDX, but for markdown it uses a lot less memory than the current approach. There are details on how to try it here: #11360

agamm · 2024-07-29T13:12:59Z

MDX is one of the main reasons we moved to Astro so it's a bit of a deal breaker at least for us.

TomGranot · 2024-07-29T15:20:01Z

@ascorbic I'm on the same front as agamm, content layer is a no no since we heavily rely on JSX for dynamic content

ascorbic · 2024-07-29T16:03:08Z

MDX is planned, it's just not implemented in the experimental release as it's a lot more complex

mj8327 · 2024-08-01T23:53:04Z

I've used a method to get around the memory issue, which is to build the entire website part by part and then use a script to merge the results. Basically a simple copy, but remember to deal with sitemap and pagefind. This method works very well so far. Each time a part of the document is built, other documents are temporarily marked as starting with "_" so that they can be ignored during the build.

agamm · 2024-08-02T02:08:02Z

I've used a method to get around the memory issue, which is to build the entire website part by part and then use a script to merge the results. Basically a simple copy, but remember to deal with sitemap and pagefind. This method works very well so far. Each time a part of the document is built, other documents are temporarily marked as starting with "_" so that they can be ignored during the build.

Can you share your code? (this is what I came up with last time)
https://gist.github.com/agamm/aaa766dc6173f4523def74cbd5681fa7

github-actions bot added the needs triage Issue needs to be triaged label Mar 19, 2024

ematipico added the needs repro Issue needs a reproduction label Mar 19, 2024

github-actions bot removed the needs triage Issue needs to be triaged label Mar 19, 2024

florian-lefebvre added needs triage Issue needs to be triaged and removed needs repro Issue needs a reproduction labels Mar 19, 2024

matthewp added needs discussion Issue needs to be discussed and removed needs triage Issue needs to be triaged labels Apr 2, 2024

bluwy self-assigned this Apr 3, 2024

bluwy mentioned this issue Apr 5, 2024

Skip prerender chunk in static output #10695

Merged

bluwy closed this as completed in #10695 Apr 8, 2024

chrome99 mentioned this issue Aug 14, 2024

Error building Astro project with 30k MDX files #11683

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory during build #10485

Out of memory during build #10485

agamm commented Mar 19, 2024

lucamtudor commented Mar 19, 2024

mj8327 commented Mar 19, 2024

florian-lefebvre commented Mar 19, 2024

ematipico commented Mar 19, 2024

github-actions bot commented Mar 19, 2024

Princesseuh commented Mar 19, 2024

agamm commented Mar 19, 2024

mingjunlu commented Mar 20, 2024 •

edited

Loading

agamm commented Mar 20, 2024

mingjunlu commented Mar 20, 2024

matthewp commented Mar 23, 2024

agamm commented Mar 23, 2024

agamm commented Mar 23, 2024

matthewp commented Mar 23, 2024

agamm commented Mar 23, 2024

k16e-me commented Mar 24, 2024

agamm commented Mar 24, 2024

agamm commented Mar 27, 2024 •

edited

Loading

matthewp commented Mar 27, 2024

agamm commented Mar 30, 2024

matthewp commented Apr 2, 2024

agamm commented Apr 3, 2024

bluwy commented Apr 5, 2024

agamm commented Apr 8, 2024 •

edited

Loading

bluwy commented Apr 8, 2024

FrancisVega commented Jul 29, 2024

agamm commented Jul 29, 2024

TomGranot commented Jul 29, 2024

ascorbic commented Jul 29, 2024

agamm commented Jul 29, 2024

TomGranot commented Jul 29, 2024

ascorbic commented Jul 29, 2024

mj8327 commented Aug 1, 2024

agamm commented Aug 2, 2024

Out of memory during build #10485

Out of memory during build #10485

Comments

agamm commented Mar 19, 2024

Astro Info

If this issue only occurs in one browser, which browser is a problem?

Describe the Bug

What's the expected result?

Link to Minimal Reproducible Example

Participation

lucamtudor commented Mar 19, 2024

mj8327 commented Mar 19, 2024

florian-lefebvre commented Mar 19, 2024

ematipico commented Mar 19, 2024

github-actions bot commented Mar 19, 2024

Princesseuh commented Mar 19, 2024

agamm commented Mar 19, 2024

mingjunlu commented Mar 20, 2024 • edited Loading

agamm commented Mar 20, 2024

mingjunlu commented Mar 20, 2024

matthewp commented Mar 23, 2024

agamm commented Mar 23, 2024

agamm commented Mar 23, 2024

matthewp commented Mar 23, 2024

agamm commented Mar 23, 2024

k16e-me commented Mar 24, 2024

agamm commented Mar 24, 2024

agamm commented Mar 27, 2024 • edited Loading

matthewp commented Mar 27, 2024

agamm commented Mar 30, 2024

matthewp commented Apr 2, 2024

agamm commented Apr 3, 2024

bluwy commented Apr 5, 2024

agamm commented Apr 8, 2024 • edited Loading

bluwy commented Apr 8, 2024

FrancisVega commented Jul 29, 2024

agamm commented Jul 29, 2024

TomGranot commented Jul 29, 2024

ascorbic commented Jul 29, 2024

agamm commented Jul 29, 2024

TomGranot commented Jul 29, 2024

ascorbic commented Jul 29, 2024

mj8327 commented Aug 1, 2024

agamm commented Aug 2, 2024

mingjunlu commented Mar 20, 2024 •

edited

Loading

agamm commented Mar 27, 2024 •

edited

Loading

agamm commented Apr 8, 2024 •

edited

Loading