-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Ignore UTF-8 decode errors #1865
Conversation
talibhmukadam
commented
Jun 1, 2023
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1865 +/- ##
=======================================
Coverage 93.42% 93.42%
=======================================
Files 34 34
Lines 6634 6634
Branches 1303 1303
=======================================
Hits 6198 6198
Misses 284 284
Partials 152 152
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Facing this issue : 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte
This PR will fix the issue.
@MartinThoma Please merge the PR.It will fix a major issue. For some PDFs we are getting this error : |
@tasfiqul-ghani |
Co-authored-by: pubpub-zz <4083478+pubpub-zz@users.noreply.github.com>
Thank you for your contribution @talibhmukadam 🙏 If you want, I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-) |
@talibhmukadam / @tasfiqul-ghani |
@MartinThoma , @pubpub-zz thank you guys for reviewing and merging the fix so fast. If I could ask, what would be a possible ETA to release the new version of pypdf with this fix? I am sorry to rush you but it is blocking us from releasing a new feature 😢 @MartinThoma , Yes, please feel free to add me to the contributor's list. 😄 @pubpub-zz , I would love to share the pdf file, but the few pdf files that we got the errors on, are resume files that contain PII information which my organization wouldn't allow me to share. I hope you understand. |
I'm creating a release at the moment. It will be on PyPI in less than 2 hours. However, if you want to make sure that the fix stays in pypdf, we need to get a sample file. Otherwise it could happen in future that another change breaks it again (but I also understand the PII restrictions 😢 ) |
Which name should I use and should I link to some profile (e.g. your Github profile?) |
Deprecations (DEP) - Deprecate PdfMerger (#1866) Bug Fixes (BUG) - Ignore UTF-8 decode errors (#1865) Robustness (ROB) - Handle missing /Type entry in Page tree (#1859) [Full Changelog](3.9.0...3.9.1)
@talibhmukadam / @tasfiqul-ghani |