Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROB: Soft failure for flate encode image mode 1 with wrong LUT size #2900

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

stefan6419846
Copy link
Collaborator

Closes #2889.

Copy link

codecov bot commented Oct 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.44%. Comparing base (c8bfa5f) to head (db2d79c).
Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2900   +/-   ##
=======================================
  Coverage   96.44%   96.44%           
=======================================
  Files          52       52           
  Lines        8726     8728    +2     
  Branches     1589     1589           
=======================================
+ Hits         8416     8418    +2     
  Misses        182      182           
  Partials      128      128           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pubpub-zz
Copy link
Collaborator

Is it possible to build tests from the pdf provided in the issue ?

@stefan6419846
Copy link
Collaborator Author

While it is possible, I honestly do not see the need for this here. The existing test already runs on an artificial 3x3 pixel image and covers all edge cases for the LUT. Adding another test does not really have any benefit IMHO and just requires downloading more data.

@pubpub-zz
Copy link
Collaborator

While it is possible, I honestly do not see the need for this here. The existing test already runs on an artificial 3x3 pixel image and covers all edge cases for the LUT. Adding another test does not really have any benefit IMHO and just requires downloading more data.

I understand and agree with what you mean. My point is just I prefer to use actual data to be able to compare pypdf behavior versus other libraries/viewers.

@stefan6419846
Copy link
Collaborator Author

I would argue that this is not what tests are about - tests should be written to ensure that everything covered by them is working correctly and does not break for whatever reason. If we are able to ensure this with a small dummy image, this is much easier than downloading at least 1 MB of data beforehand and essentially verifying that no error is raised by default - with the small dummy image, we are able to verify the extraction result directly from within code as well.

Where are we doing a comparison at the moment? We might have a table or written list and some benchmarking, but nothing which would indicate this from a test perspective. Nothing prevents us to use actual data in the case that we explicitly need comparison code later on.

@pubpub-zz
Copy link
Collaborator

Sorry if you though I was waiting for something. Just a little overbusy

@stefan6419846
Copy link
Collaborator Author

No worries - I just added on your comment and would have waited for the discussion result anyway.

@stefan6419846 stefan6419846 merged commit 80c3939 into main Oct 18, 2024
16 checks passed
@stefan6419846 stefan6419846 deleted the flate-mode1 branch October 18, 2024 17:21
stefan6419846 added a commit that referenced this pull request Oct 27, 2024
## What's new

### New Features (ENH)
- Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001

### Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893) by @ssjkamei
- Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei
- Improve handling of spaces in text extraction (#2882) by @ssjkamei

### Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846

### Documentation (DOC)
- Use latest package versions (#2907) by @stefan6419846
- Correct example of reading FileAttachment annotation (#2906) by @j-t-1

### Developer Experience (DEV)
- Update pinned requirements (#2918) by @stefan6419846
- Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz

### Maintenance (MAINT)
- Remove references to outdated Python versions (#2919) by @stefan6419846
- Generalize the method of obtaining space_code (#2891) by @ssjkamei
- Unnecessary character mapping process (#2888) by @ssjkamei
- New LZW decoding implementation (#2887) by @MartinThoma

### Testing (TST)
- Add LzwCodec for encoding (#2883) by @MartinThoma

### Code Style (STY)
- Capitalize error messages (#2903) by @j-t-1
- Modify error messages in PdfWriter (#2902) by @j-t-1

[Full Changelog](5.0.1...5.1.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PdfReadError: Too many lookup values while extracting image
2 participants