-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using PdfReader causes a crash #2761
Comments
Thanks for the report. In general, I would consider it to be okay for completely broken PDF files to throw any exception. For the specific PDF file and the cyclic page tree, we might want to provide a dedicated error message. Feel free to submit a corresponding PR. |
@Avgor46 |
According to https://redirect.github.com/ispras/oss-sydr-fuzz/pull/240/files#diff-0493c80d58164d8423b9d74653bea11bbc054b8797d45fbb5552cf670fa23961R30, this might be part of the pdf.js test suite as well. |
I agree, but there is an important addition: Infinite loops as in GHSA-xcjx-m2pj-8g79 are not ok. pypdf should fail with an exception, not with a hanging application. A RecursionError might be ok, but (without having checked that specific case) refactoring of the code could also lead to an infinite loop which would no longer be ok. |
Ok, so, I've found several other PoCs that cause different exceptions (AttributeError, ValueError, ...) in library code. I see 3 ways to close this issue:
What should we do next? |
In general, it always depends on the actual PDF file which needs investigation - there is not "the way" to go with these cases. Thus (1) might not work here. (3) sounds bad as well, as this might just hide/obfuscate real bugs we might have. (2) sounds the best, but I am not sure what the best solution about this would be. I prefer not to have lots of mass reports of potentially just broken files at once - neither in this issue nor in separate ones. It just takes its time to further analyze each case and decide on how to go with it, while the number of active/frequent contributors tends to be rather low. My preferred approach in this case would be to gradually report and solve these issues by opening a small amount of issues each time and ideally getting this solved/analyzed with your support before opening the next batch. |
See #1210 . Let's have the discussion weather it's fine to throw standard exceptions in that discussion :-) |
Free-to-use PDF document are valuable to pypdf, but also potentially to other libraries. If you have the copyright on those PDFs, please add a PR to add them to https://github.com/py-pdf/sample-files/ :-) |
Looking at https://github.com/py-pdf/pypdf/blob/main/pypdf/errors.py I would say we already decided on custom exceptions. In this specific case, I would have expected |
Hi!
I've been fuzzing PdfReader with a sydr-fuzz via langchain project and found few errors. The question is should the user handle python errors from the pypdf library or is it a bug in pypdf? The necessary information to reproduce one of them is provided below.
Environment
Code + PDF
This is a minimal, complete example that shows the issue:
PoC
crash-b26d05712a29b241ac6f9dc7fff57428ba2d1a04.pdf
Traceback
This is the complete traceback I see:
The text was updated successfully, but these errors were encountered: