Excessive memory usage when converting MD to HTML #10426

DavorJ · 2024-11-28T18:20:59Z

For simplicity, lets say I have a MD file that embeds large blobs. These blobs can be 500MB for example. Attached is a MD file (blob.md) with an embedded blob of 18MB, for testing, as larger uploads are not possible on GitHub.

When converted to HTML, it should look like this:

Generating this HTML takes a few seconds and commits 4GB of memory.

Double the size of the MD, and you will at least double the required memory. 40MB MD file commits about 10GB of memory, etc. This seems quite disproportional: the MD already has the blob BASE64 encoded in a string, and all that is required is to copy-paste it into a HTML. So what is causing the excessive memory use?

The issues seems to be when using the --embed-resources argument:

pandoc blob.md --to html4 --output blob.html --from markdown # completes instantly with no RAM overhead.
pandoc blob.md --to html4 --output blob.html --from markdown --embed-resources # requires at least 4GB of RAM.
pandoc blob.md --to html4 --output blob.html --from markdown --standalone # instant but should be equivalent?!?

Version of pandoc used: 3.5 on Windows 10.

PS You can create the MD file with Rmarkdown like this

---
title: "Test Blob"
output: 
  flexdashboard::flex_dashboard:
---

# Page 1

## Row

### Column 1

´´´{r, echo=FALSE}
set.seed(2024)
blob <- serialize(sample.int(2^31, size = 18*1024^2/8), connection = NULL)
saveRDS(blob, "./blob.dmp", compress = FALSE)
downloadthis::download_file("./blob.dmp")
´´´

The text was updated successfully, but these errors were encountered:

silby · 2024-11-28T19:38:33Z

Discussed here #10075

jgm · 2024-11-29T14:46:51Z

Closing as a duplicate of #10075. The problem is with an inefficient URI parser in network-uri. The patches I submitted to network-uri should help somewhat, but they haven't made it into a released version yet (the last release is from 2022).

DavorJ added the bug label Nov 28, 2024

jgm closed this as completed Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive memory usage when converting MD to HTML #10426

Excessive memory usage when converting MD to HTML #10426

DavorJ commented Nov 28, 2024 •

edited

Loading

silby commented Nov 28, 2024

jgm commented Nov 29, 2024

Excessive memory usage when converting MD to HTML #10426

Excessive memory usage when converting MD to HTML #10426

Comments

DavorJ commented Nov 28, 2024 • edited Loading

silby commented Nov 28, 2024

jgm commented Nov 29, 2024

DavorJ commented Nov 28, 2024 •

edited

Loading