Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage when converting MD to HTML #10426

Closed
DavorJ opened this issue Nov 28, 2024 · 2 comments
Closed

Excessive memory usage when converting MD to HTML #10426

DavorJ opened this issue Nov 28, 2024 · 2 comments
Labels

Comments

@DavorJ
Copy link

DavorJ commented Nov 28, 2024

For simplicity, lets say I have a MD file that embeds large blobs. These blobs can be 500MB for example. Attached is a MD file (blob.md) with an embedded blob of 18MB, for testing, as larger uploads are not possible on GitHub.

When converted to HTML, it should look like this:

Generating this HTML takes a few seconds and commits 4GB of memory.

Double the size of the MD, and you will at least double the required memory. 40MB MD file commits about 10GB of memory, etc. This seems quite disproportional: the MD already has the blob BASE64 encoded in a string, and all that is required is to copy-paste it into a HTML. So what is causing the excessive memory use?

The issues seems to be when using the --embed-resources argument:

pandoc blob.md --to html4 --output blob.html --from markdown # completes instantly with no RAM overhead.
pandoc blob.md --to html4 --output blob.html --from markdown --embed-resources # requires at least 4GB of RAM.
pandoc blob.md --to html4 --output blob.html --from markdown --standalone # instant but should be equivalent?!?

Version of pandoc used: 3.5 on Windows 10.

PS You can create the MD file with Rmarkdown like this
---
title: "Test Blob"
output: 
  flexdashboard::flex_dashboard:
---

# Page 1

## Row

### Column 1

´´´{r, echo=FALSE}
set.seed(2024)
blob <- serialize(sample.int(2^31, size = 18*1024^2/8), connection = NULL)
saveRDS(blob, "./blob.dmp", compress = FALSE)
downloadthis::download_file("./blob.dmp")
´´´
@DavorJ DavorJ added the bug label Nov 28, 2024
@silby
Copy link
Contributor

silby commented Nov 28, 2024

Discussed here #10075

@jgm
Copy link
Owner

jgm commented Nov 29, 2024

Closing as a duplicate of #10075. The problem is with an inefficient URI parser in network-uri. The patches I submitted to network-uri should help somewhat, but they haven't made it into a released version yet (the last release is from 2022).

@jgm jgm closed this as completed Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants