Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing PHP process through memory exhaustion when decompressing images. #475

Closed
b3n-l opened this issue Oct 29, 2021 · 1 comment · Fixed by #476
Closed

Crashing PHP process through memory exhaustion when decompressing images. #475

b3n-l opened this issue Oct 29, 2021 · 1 comment · Fixed by #476

Comments

@b3n-l
Copy link
Contributor

b3n-l commented Oct 29, 2021

Based on issues #104 and #357

I think we have an issue around decompression bombs, where the output file is many times larger than the input file. The demo on pdfparser.org gives the following fatal error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 25708532 bytes) in /var/www/html/sites/all/modules/custom/pdfparser_demo/pdfparser/src/Smalot/PdfParser/RawData/FilterHelper.php on line 244

The problem is, before it checks the Config class for disabled image parsing, you need to have called gzuncompress in FilterHelper.php. If this pushes it above the PHP memory limit, it'll simply die. This is why the config option doesn't stop the crash.

xpdf, as example of another popular PHP library, checks for compression bombs by doing the following in C++. This is lifted directly from the source code in Stream.cc

// check for a 'decompression bomb'
  if (totalOut > 50000000 && totalIn < totalOut / 250) {
    error(errSyntaxError, getPos(), "Decompression bomb in flate stream");
    endOfBlock = eof = gTrue;
    remain = 0;
  }

I've tested this in FilterHelper by adding a max limit to the number of bytes gzuncompress can parse on line 254 as an argument. This will throw an Exception then, rather than going to a fatal error. I don't know what the optimum values would be, but, this could be useful.

Potentially it'd be something to add to the Config class, as an optional memory limit. I wanted to get the dev team's thoughts first before submitting a PR though.

The PDF attached is based on that posted by yaspr in issue #104
document_with_text_and_png_image_4pages.pdf

Originally posted by @b3n-l in #104 (comment)

@k00ni
Copy link
Collaborator

k00ni commented Oct 29, 2021

I wanted to get the dev team's thoughts first before submitting a PR though.

If I understand you correctly, you add a new parameter to the decompression call to raise an exception if output value increases above a certain point?

As far as I remember the Config option you talk about was created to save memory when parsing a PDF with images. Using it in this way to avoid memory "explosions" is a nice extension.

You should send us a pull request and we can discuss details there.

CC @izabala

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants