-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: missing error on name without leading / #2387
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2387 +/- ##
=======================================
Coverage 94.44% 94.44%
=======================================
Files 49 49
Lines 8027 8027
Branches 1618 1618
=======================================
Hits 7581 7581
Misses 276 276
Partials 170 170 ☔ View full report in Codecov by Sentry. |
Thanks for the PR. Did you previously get any errors regarding this from a specific PDF file? |
Here is a quick example where it produces a corrupted pdf file:
|
I am still having trouble understanding how this affects real-world usages - the above code will not generate a valid PDF anyway due to the missing page tree. Previously,
This will not change with your patch. Is this correct? |
I agree with @stefan6419846 |
The problem is that binary representation of a name object always start with "/", and object like b"foo" is not a valid pdf object, it means that's not possible to parse the generated file correctly. If "/" is forgotten, nothing stop file creation. An other thing is that b"/" is not part of the name value, but only used in binary representation in pdf file. Other libraries like https://github.com/pmaupin/pdfrw/blob/master/pdfrw/objects/pdfname.py don't need it for object creation, and for me it's the correct way to implement it. It's confusing to require it and lead to corrupted files., specially when you use external libraries built on top of it. That's a problem I've encountered more than a year ago and the solution was to not use it. I've not changed the current way it's implemented, because it's a huge change, but it could be done in a major release. |
We are currently preparing a major release, but we would need a corresponding deprecation process which delays the final hard change for some more time as per our deprecation policy: https://pypdf.readthedocs.io/en/latest/dev/deprecations.html |
I understand that your issue is when the "/" is forgotten nothing prevents writing the file. The exception should be raised within the constructor. An alternative could be to add the forgotten "/" within the constructor. doing this job during file writing will impact too much performances.
pypdf has always considered the name to include the "/" as a convention. changing this will prevent nearly all existing programs to work in the future.
I don't think that changing the convention is a good idea: so many people are dowloading and using pypdf as it is (https://piptrends.com/compare/pypdf-vs-pdfrw)
|
Co-authored-by: pubpub-zz <4083478+pubpub-zz@users.noreply.github.com>
ok, I've changed it to oblige leading /, and corrected errors found. |
I've not found a correct way to overriding string constructor, so instead I've raised an error because the result without leading slash produce a corrupted pdf in any cases. |
@pubpub-zz Are you okay with the current implementation or do you prefer/propose an alternative one? |
My first merge 🎉 |
## What's new Generating name objects (`NameObject`) without a leading slash is considered deprecated now. Previously, just a plain warning would be logged, leading to possibly invalid PDF files. According to our deprecation policy, this will log a *DeprecationWarning* for now. ### New Features (ENH) - Add get_pages_from_field (#2494) by @pubpub-zz - Add reattach_fields function (#2480) by @pubpub-zz - Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz ### Bug Fixes (BUG) - Missing error on name without leading / (#2387) by @Rak424 - encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon - BI in text content identified as image tag (#2459) by @pubpub-zz ### Robustness (ROB) - Missing basefont entry in type 3 font (#2469) by @pubpub-zz ### Documentation (DOC) - Improve lossless compression example (#2488) by @j-t-1 - Amend robustness documentation (#2479) by @j-t-1 ### Developer Experience (DEV) - Fix changelog for UTF-8 characters (#2462) by @stefan6419846 ### Maintenance (MAINT) - Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz - Remove user assignment for feature requests (#2483) by @stefan6419846 - Remove reference to old 2.0.0 branch (#2482) by @stefan6419846 ### Testing (TST) - Fix benchmark failures (#2481) by @stefan6419846 - Broken test due to expired test file URL (#2468) by @pubpub-zz - Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon [Full Changelog](4.0.2...4.1.0)
## What's new Generating name objects (`NameObject`) without a leading slash is considered deprecated now. Previously, just a plain warning would be logged, leading to possibly invalid PDF files. According to our deprecation policy, this will log a *DeprecationWarning* for now. ### New Features (ENH) - Add get_pages_from_field (#2494) by @pubpub-zz - Add reattach_fields function (#2480) by @pubpub-zz - Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz ### Bug Fixes (BUG) - Missing error on name without leading / (#2387) by @Rak424 - encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon - BI in text content identified as image tag (#2459) by @pubpub-zz ### Robustness (ROB) - Missing basefont entry in type 3 font (#2469) by @pubpub-zz ### Documentation (DOC) - Improve lossless compression example (#2488) by @j-t-1 - Amend robustness documentation (#2479) by @j-t-1 ### Developer Experience (DEV) - Fix changelog for UTF-8 characters (#2462) by @stefan6419846 ### Maintenance (MAINT) - Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz - Remove user assignment for feature requests (#2483) by @stefan6419846 - Remove reference to old 2.0.0 branch (#2482) by @stefan6419846 ### Testing (TST) - Fix benchmark failures (#2481) by @stefan6419846 - Broken test due to expired test file URL (#2468) by @pubpub-zz - Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon [Full Changelog](4.0.2...4.1.0)
Leading slash is not part of the Name value, but only required in binary representation, so it should not be needed for Name object creation, and binary representation should always start with b"/".
Description in spec: