ROB: Capture UnicodeDecodeError at PdfReader.pdf_header #1759

MartinThoma · 2023-03-31T13:58:15Z

codecov · 2023-03-31T14:12:40Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (1563e8e) 92.41% compared to head (38fe3ee) 92.41%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1759   +/-   ##
=======================================
  Coverage   92.41%   92.41%           
=======================================
  Files          34       34           
  Lines        6575     6579    +4     
  Branches     1301     1301           
=======================================
+ Hits         6076     6080    +4     
  Misses        326      326           
  Partials      173      173

Impacted Files	Coverage Δ
pypdf/_reader.py	`91.22% <100.00%> (+0.02%)`	⬆️
pypdf/_writer.py	`86.15% <100.00%> (+0.01%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

pubpub-zz

I'm very confused about this PR : if the message can not be decoded it means it does not respect the standard, and having such an exception seems for me acceptable. If there is realy a problem, I would use use decode with 'backslashreplace' to have a better understanding of the non recognised 'magic' and eventy raise an error.

MartinThoma · 2023-04-02T08:11:40Z

I think I've noticed pdfs in the past where the version was not set (correctly), but the remaining document was just fine

pubpub-zz · 2023-04-02T08:13:13Z

I think I've noticed pdfs in the past where the version was not set (correctly), but the remaining document was just fine

so what about just using backslashreplace to be tolerant but still able to read the magic ?

MartinThoma · 2023-04-06T12:13:38Z

#1768 is a better solution.

MartinThoma added 2 commits March 31, 2023 15:57

ROB: Capture UnicodeDecodeError at PdfReader.pdf_header

305dff5

Add test

38fe3ee

MartinThoma mentioned this pull request Mar 31, 2023

UnicodeDecodeError 'utf-8' codec can't decode byte 0xac in position 0: invalid start byte #1758

Closed

pubpub-zz reviewed Apr 2, 2023

View reviewed changes

MartinThoma closed this Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROB: Capture UnicodeDecodeError at PdfReader.pdf_header #1759

ROB: Capture UnicodeDecodeError at PdfReader.pdf_header #1759

MartinThoma commented Mar 31, 2023 •

edited

Loading

codecov bot commented Mar 31, 2023 •

edited

Loading

pubpub-zz left a comment

MartinThoma commented Apr 2, 2023

pubpub-zz commented Apr 2, 2023

MartinThoma commented Apr 6, 2023

ROB: Capture UnicodeDecodeError at PdfReader.pdf_header #1759

ROB: Capture UnicodeDecodeError at PdfReader.pdf_header #1759

Conversation

MartinThoma commented Mar 31, 2023 • edited Loading

codecov bot commented Mar 31, 2023 • edited Loading

Codecov Report

pubpub-zz left a comment

Choose a reason for hiding this comment

MartinThoma commented Apr 2, 2023

pubpub-zz commented Apr 2, 2023

MartinThoma commented Apr 6, 2023

MartinThoma commented Mar 31, 2023 •

edited

Loading

codecov bot commented Mar 31, 2023 •

edited

Loading