You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For simplicity, lets say I have a MD file that embeds large blobs. These blobs can be 500MB for example. Attached is a MD file (blob.md) with an embedded blob of 18MB, for testing, as larger uploads are not possible on GitHub.
When converted to HTML, it should look like this:
Generating this HTML takes a few seconds and commits 4GB of memory.
Double the size of the MD, and you will at least double the required memory. 40MB MD file commits about 10GB of memory, etc. This seems quite disproportional: the MD already has the blob BASE64 encoded in a string, and all that is required is to copy-paste it into a HTML. So what is causing the excessive memory use?
The issues seems to be when using the --embed-resources argument:
pandoc blob.md --to html4 --output blob.html --from markdown # completes instantly with no RAM overhead.
pandoc blob.md --to html4 --output blob.html --from markdown --embed-resources # requires at least 4GB of RAM.
pandoc blob.md --to html4 --output blob.html --from markdown --standalone # instant but should be equivalent?!?
Version of pandoc used: 3.5 on Windows 10.
PS You can create the MD file with Rmarkdown like this
Closing as a duplicate of #10075. The problem is with an inefficient URI parser in network-uri. The patches I submitted to network-uri should help somewhat, but they haven't made it into a released version yet (the last release is from 2022).
For simplicity, lets say I have a MD file that embeds large blobs. These blobs can be 500MB for example. Attached is a MD file (blob.md) with an embedded blob of 18MB, for testing, as larger uploads are not possible on GitHub.
When converted to HTML, it should look like this:
Generating this HTML takes a few seconds and commits 4GB of memory.
Double the size of the MD, and you will at least double the required memory. 40MB MD file commits about 10GB of memory, etc. This seems quite disproportional: the MD already has the blob BASE64 encoded in a string, and all that is required is to copy-paste it into a HTML. So what is causing the excessive memory use?
The issues seems to be when using the
--embed-resources
argument:Version of pandoc used:
3.5
on Windows 10.PS You can create the MD file with Rmarkdown like this
The text was updated successfully, but these errors were encountered: