Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find 'endstream' marker for obj starting at 13367. #2326

Closed
shafeeralip opened this issue Dec 4, 2023 · 1 comment · Fixed by #2335
Closed

Unable to find 'endstream' marker for obj starting at 13367. #2326

shafeeralip opened this issue Dec 4, 2023 · 1 comment · Fixed by #2335
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-robustness-issue From a users perspective, this is about robustness

Comments

@shafeeralip
Copy link

Hello,

When executing this piece of code:

from pypdf import PdfReader,PdfWriter
import traceback

try:
    input_pdf = PdfReader(dwnld_filepath)
    output_pdf = PdfWriter()
    image = input_pdf.pages[0]
    output_pdf.add_page(image)
    output_pdf.write(file_path)
except Exception as e:
    traceback.print_exc()

here the PDF file(s) that cause the issue.

EveBest.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/Users/shafeerali/Documents/Nanonets/avanto/API/test.py", line 58, in <module>
    output_pdf.add_page(image)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_writer.py", line 418, in add_page
    return self._add_page(page, list.append, excluded_keys)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_writer.py", line 331, in _add_page
    page = cast("PageObject", page_org.clone(self, False, excluded_keys))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 199, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 199, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 199, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 300, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 116, in clone
    arr.append(data.clone(pdf_dest, force_duplicate, ignore_fields))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 292, in clone
    obj = self.get_object()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 312, in get_object
    obj = self.pdf.get_object(self)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_reader.py", line 1401, in get_object
    retval = read_object(self.stream, self)  # type: ignore
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 1280, in read_object
    return DictionaryObject.read_from_stream(stream, pdf, forced_encoding)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 538, in read_from_stream
    data["__streamdata__"] = read_unsized_from_steam(stream, pdf)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 432, in read_unsized_from_steam
    raise PdfReadError(
pypdf.errors.PdfReadError: Unable to find 'endstream' marker for obj starting at 13367.
@pubpub-zz
Copy link
Collaborator

the PDF is abnormal : Some reference are false. The image is within object 20 (offset 13291), and the next object referenced in the xref table is 21 (offset15983) however this object does not exists neither 22(offset16069). actually the next object is 23(offset 17292).
Under analysis to find how to ignore the wrong objects.

@stefan6419846 stefan6419846 added is-robustness-issue From a users perspective, this is about robustness Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-robustness-issue From a users perspective, this is about robustness
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants