-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROB: Soft failure for flate encode image mode 1 with wrong LUT size #2900
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2900 +/- ##
=======================================
Coverage 96.44% 96.44%
=======================================
Files 52 52
Lines 8726 8728 +2
Branches 1589 1589
=======================================
+ Hits 8416 8418 +2
Misses 182 182
Partials 128 128 ☔ View full report in Codecov by Sentry. |
Is it possible to build tests from the pdf provided in the issue ? |
While it is possible, I honestly do not see the need for this here. The existing test already runs on an artificial 3x3 pixel image and covers all edge cases for the LUT. Adding another test does not really have any benefit IMHO and just requires downloading more data. |
I understand and agree with what you mean. My point is just I prefer to use actual data to be able to compare pypdf behavior versus other libraries/viewers. |
I would argue that this is not what tests are about - tests should be written to ensure that everything covered by them is working correctly and does not break for whatever reason. If we are able to ensure this with a small dummy image, this is much easier than downloading at least 1 MB of data beforehand and essentially verifying that no error is raised by default - with the small dummy image, we are able to verify the extraction result directly from within code as well. Where are we doing a comparison at the moment? We might have a table or written list and some benchmarking, but nothing which would indicate this from a test perspective. Nothing prevents us to use actual data in the case that we explicitly need comparison code later on. |
Sorry if you though I was waiting for something. Just a little overbusy |
No worries - I just added on your comment and would have waited for the discussion result anyway. |
## What's new ### New Features (ENH) - Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001 ### Bug Fixes (BUG) - Fix font specificier for FreeText annotation (#2893) by @ssjkamei - Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei - Improve handling of spaces in text extraction (#2882) by @ssjkamei ### Robustness (ROB) - Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846 ### Documentation (DOC) - Use latest package versions (#2907) by @stefan6419846 - Correct example of reading FileAttachment annotation (#2906) by @j-t-1 ### Developer Experience (DEV) - Update pinned requirements (#2918) by @stefan6419846 - Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz ### Maintenance (MAINT) - Remove references to outdated Python versions (#2919) by @stefan6419846 - Generalize the method of obtaining space_code (#2891) by @ssjkamei - Unnecessary character mapping process (#2888) by @ssjkamei - New LZW decoding implementation (#2887) by @MartinThoma ### Testing (TST) - Add LzwCodec for encoding (#2883) by @MartinThoma ### Code Style (STY) - Capitalize error messages (#2903) by @j-t-1 - Modify error messages in PdfWriter (#2902) by @j-t-1 [Full Changelog](5.0.1...5.1.0)
Closes #2889.