PdfReadError: Too many lookup values while extracting image #2889

michelcrypt4d4mus · 2024-10-04T03:06:24Z

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
macOS-13.6.7-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('pycryptodome', '3.20.0'), PIL=10.4.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader
reader = PdfReader('New York State 100AnnvBook140701final10.pdf')

for page in reader.pages:
    print(page)
    for image in page.images:
        print(image)
        print(image.image)

New York State 100AnnvBook140701final10.pdf

Traceback

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 2454, in __iter__
    yield self[i]
          ~~~~^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 2450, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 490, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3.11/site-packages/pypdf/filters.py", line 791, in _xobj_to_image
    img, image_format, extension, _ = _handle_flate(
                                      ^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3.11/site-packages/pypdf/_xobj_image_helpers.py", line 218, in _handle_flate
    raise PdfReadError(
pypdf.errors.PdfReadError: Too many lookup values: Expected 8, got 1016.

The text was updated successfully, but these errors were encountered:

stefan6419846 · 2024-10-04T08:12:48Z

Thanks for your report. Your PDF file seems to contain some invalid color lookup tables, especially on page 13. The first broken LUT is too big, the following LUTs are too small.

The following code for lines 212 to 224 seems to fix it:

                if len(lookup) != expected_count:
                    if len(lookup) < expected_count:
                        logger_warning(
                            f"Not enough lookup values: Expected {expected_count}, got {len(lookup)}.",
                            __name__
                        )
                        lookup += bytes([0] * (expected_count - len(lookup)))
                    elif not check_if_whitespace_only(lookup[expected_count:]):
                        logger_warning(
                            f"Too many lookup values: Expected {expected_count}, got {len(lookup)}.",
                            __name__
                        )
                    lookup = lookup[:expected_count]

This basically adds a right padding with null bytes if there are not enough values and always cuts all entries which are out of bounds - and emits warnings instead of hard errors.

Closes #2889.

…2900) Closes #2889.

stefan6419846 added workflow-images From a users perspective, image handling is the affected feature/workflow is-robustness-issue From a users perspective, this is about robustness labels Oct 4, 2024

stefan6419846 added a commit that referenced this issue Oct 12, 2024

ROB: Soft failure for flate encode image mode 1 with wrong LUT size

db2d79c

Closes #2889.

stefan6419846 mentioned this issue Oct 12, 2024

ROB: Soft failure for flate encode image mode 1 with wrong LUT size #2900

Merged

stefan6419846 added a commit that referenced this issue Oct 18, 2024

ROB: Soft failure for flate encode image mode 1 with wrong LUT size (#…

80c3939

…2900) Closes #2889.

stefan6419846 closed this as completed in #2900 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PdfReadError: Too many lookup values while extracting image #2889

PdfReadError: Too many lookup values while extracting image #2889

michelcrypt4d4mus commented Oct 4, 2024 •

edited

Loading

stefan6419846 commented Oct 4, 2024

PdfReadError: Too many lookup values while extracting image #2889

PdfReadError: Too many lookup values while extracting image #2889

Comments

michelcrypt4d4mus commented Oct 4, 2024 • edited Loading

Environment

Code + PDF

Traceback

stefan6419846 commented Oct 4, 2024

michelcrypt4d4mus commented Oct 4, 2024 •

edited

Loading