MAINT: New LZW decoding implementation #2887

MartinThoma · 2024-09-30T18:18:41Z

The basis for this implementation is https://github.com/empira/PDFsharp/blob/master/src/foundation/src/PDFsharp/src/PdfSharp/Pdf.Filters/LzwDecode.cs (MIT licensed)

As this removes the LZWDecode class from a public module, we have to do a major release when we release this change.

codecov · 2024-09-30T18:28:07Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.35%. Comparing base (8e1799e) to head (9bf0e56).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2887      +/-   ##
==========================================
+ Coverage   96.27%   96.35%   +0.08%     
==========================================
  Files          52       52              
  Lines        8689     8735      +46     
  Branches     1733     1723      -10     
==========================================
+ Hits         8365     8417      +52     
+ Misses        187      186       -1     
+ Partials      137      132       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pypdf/_codecs/_codecs.py

pypdf/filters.py

pypdf/_codecs/_codecs.py

Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>

pypdf/filters.py

pypdf/_codecs/_codecs.py

stefan6419846

Looks good for me now and keeps the necessary bits of backwards compatibility, thus I do not see any issue with merging this now.

@hpierre001

## What's new ### New Features (ENH) - Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001 ### Bug Fixes (BUG) - Fix font specificier for FreeText annotation (#2893) by @ssjkamei - Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei - Improve handling of spaces in text extraction (#2882) by @ssjkamei ### Robustness (ROB) - Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846 ### Documentation (DOC) - Use latest package versions (#2907) by @stefan6419846 - Correct example of reading FileAttachment annotation (#2906) by @j-t-1 ### Developer Experience (DEV) - Update pinned requirements (#2918) by @stefan6419846 - Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz ### Maintenance (MAINT) - Remove references to outdated Python versions (#2919) by @stefan6419846 - Generalize the method of obtaining space_code (#2891) by @ssjkamei - Unnecessary character mapping process (#2888) by @ssjkamei - New LZW decoding implementation (#2887) by @MartinThoma ### Testing (TST) - Add LzwCodec for encoding (#2883) by @MartinThoma ### Code Style (STY) - Capitalize error messages (#2903) by @j-t-1 - Modify error messages in PdfWriter (#2902) by @j-t-1 [Full Changelog](5.0.1...5.1.0)

MAINT: New LZW decoding implementation

a596b02