You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think we have an issue around decompression bombs, where the output file is many times larger than the input file. The demo on pdfparser.org gives the following fatal error: Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 25708532 bytes) in /var/www/html/sites/all/modules/custom/pdfparser_demo/pdfparser/src/Smalot/PdfParser/RawData/FilterHelper.php on line 244
The problem is, before it checks the Config class for disabled image parsing, you need to have called gzuncompress in FilterHelper.php. If this pushes it above the PHP memory limit, it'll simply die. This is why the config option doesn't stop the crash.
xpdf, as example of another popular PHP library, checks for compression bombs by doing the following in C++. This is lifted directly from the source code in Stream.cc
// check for a 'decompression bomb'
if (totalOut > 50000000 && totalIn < totalOut / 250) {
error(errSyntaxError, getPos(), "Decompression bomb in flate stream");
endOfBlock = eof = gTrue;
remain = 0;
}
I've tested this in FilterHelper by adding a max limit to the number of bytes gzuncompress can parse on line 254 as an argument. This will throw an Exception then, rather than going to a fatal error. I don't know what the optimum values would be, but, this could be useful.
Potentially it'd be something to add to the Config class, as an optional memory limit. I wanted to get the dev team's thoughts first before submitting a PR though.
I wanted to get the dev team's thoughts first before submitting a PR though.
If I understand you correctly, you add a new parameter to the decompression call to raise an exception if output value increases above a certain point?
As far as I remember the Config option you talk about was created to save memory when parsing a PDF with images. Using it in this way to avoid memory "explosions" is a nice extension.
You should send us a pull request and we can discuss details there.
Based on issues #104 and #357
I think we have an issue around decompression bombs, where the output file is many times larger than the input file. The demo on pdfparser.org gives the following fatal error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 25708532 bytes) in /var/www/html/sites/all/modules/custom/pdfparser_demo/pdfparser/src/Smalot/PdfParser/RawData/FilterHelper.php on line 244
The problem is, before it checks the
Config
class for disabled image parsing, you need to have calledgzuncompress
inFilterHelper.php
. If this pushes it above the PHP memory limit, it'll simply die. This is why the config option doesn't stop the crash.xpdf, as example of another popular PHP library, checks for compression bombs by doing the following in C++. This is lifted directly from the source code in Stream.cc
I've tested this in FilterHelper by adding a max limit to the number of bytes
gzuncompress
can parse on line 254 as an argument. This will throw an Exception then, rather than going to a fatal error. I don't know what the optimum values would be, but, this could be useful.Potentially it'd be something to add to the Config class, as an optional memory limit. I wanted to get the dev team's thoughts first before submitting a PR though.
The PDF attached is based on that posted by yaspr in issue #104
document_with_text_and_png_image_4pages.pdf
Originally posted by @b3n-l in #104 (comment)
The text was updated successfully, but these errors were encountered: