-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Make remove_annots_from_page more flexible #1831
Conversation
py-pdf#1829) - add pytest-socket to pyproject.toml
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1831 +/- ##
==========================================
- Coverage 94.54% 94.51% -0.04%
==========================================
Files 43 43
Lines 7549 7561 +12
Branches 1490 1497 +7
==========================================
+ Hits 7137 7146 +9
- Misses 253 255 +2
- Partials 159 160 +1 ☔ View full report in Codecov by Sentry. |
Thank you for the contribution <3 Overall, I tend to add the improvement to pypdf, but I think a few decisions need to be discussed and potentially refined:
|
- rename delete_decide_function to annotation_filter_function - add page to callback for annotation_filter_function
I'm glad you like it! 1 Method nameYou're right, it's confusing. I changed 2 Callback parametersI gave the page as callback. I didn't remove
3 DocumentationIs it better now? 4 NamingI also didn't like my names. I settled on |
@pubpub-zz What is your opinion about this PR? |
subtypes: Optional[ | ||
Union[AnnotationSubtype, Iterable[AnnotationSubtype]] | ||
] = None, | ||
annotation_filter_function: Optional[Callable] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotation_filter_function: Optional[Callable] = None, | |
annotation_filter_function: Optional[Callable[[DictionaryObject, ArrayObject, DictionaryObject], bool]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below bout the 2nd param
annotation: ArrayObject, | ||
obj: DictionaryObject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can add a little bit more documentation:
- Why is the annotation an ArrayObject?
- What does
obj
represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually properly understand these objects. I just kept the original names and annotations, see this part of the old code where annotations were filtered. I just passed all the relevant context objects to the filter function, in case they might be useful for filtering. :)
In short, I don't know what to write myself, sorry. Can you do this?
subtypes: Optional[ | ||
Union[AnnotationSubtype, Iterable[AnnotationSubtype]] | ||
] = None, | ||
annotation_filter_function: Optional[Callable] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea: We could call the parameter exclude
. That (in combination with the type annotation and the docs) seems pretty clear to me. Whenever it evaluates to True
, the entry is excluded. That would inverse the logic.
I'm not happy with annotation_filter_function
as it's pretty lengthy + I never know if we "filter out" (remove) or "filter to keep".
We could then make the default the constant-False function (which would not remove anything). This would mean the annotation would become simpler as the "Optional" could be removed.
Then the meaning of subtypes in combination with the new parameter has to be clear. I think it would be best if the both are independent filter functions. If both are set, we can create a new function that combines both internally. That means we don't have to ignore one parameter. And we don't have to log anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't really like exclude
, because it feels more natural to return True
to keep something. Maybe include
or filter
?
We could then make the default the constant-False function (which would not remove anything). This would mean the annotation would become simpler as the "Optional" could be removed.
I'm not sure I get this. I wanted to set the default_annotation_filter_function
as the annotation_filter_function
by default, too, without Optional[Callable] = None
. But mypy
didn't let me iirc.
I think it would be best if the both are independent filter functions. If both are set, we can create a new function that combines both internally. That means we don't have to ignore one parameter. And we don't have to log anything.
Agreed.
""" | ||
if annotation_filter_function is None: | ||
annotation_filter_function = default_annotation_filter_function | ||
|
||
page = cast(DictionaryObject, page.get_object()) | ||
if PG.ANNOTS in page: | ||
i = 0 | ||
while i < len(cast(ArrayObject, page[PG.ANNOTS])): | ||
an = cast(ArrayObject, page[PG.ANNOTS])[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an normally contains the IndirectObject pointing onto obj. I do not see any use case where this is required. However if somebody needs it, it will be available through indirect_reference property.
For me, just the page, and the annotation (Just to be renamed in the example) which may be clearer are sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I properly understood your comment. You're basically suggesting to have a simpler filter function?
def default_annotation_filter_function(page: Any, an: Any, obj: Any) -> bool:
becomes:
def default_annotation_filter_function(page: Any, an: Any) -> bool:
And obj
would still be there in an.indirect_reference
?
I never truly understood these objects. Can you maybe write a useful documentation for annotation_filter_function
as suggested here?
@@ -588,3 +588,8 @@ def replace(self, new_image: Any, **kwargs: Any) -> None: | |||
self.name = self.name[: self.name.rfind(".")] + extension | |||
self.data = byte_stream | |||
self.image = img | |||
|
|||
|
|||
def default_annotation_filter_function(page: Any, an: Any, obj: Any) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any advantage to have this function in _utils. It could be sufficient as a internal function next to _annotation_filter_function
subtypes: Optional[ | ||
Union[AnnotationSubtype, Iterable[AnnotationSubtype]] | ||
] = None, | ||
annotation_filter_function: Optional[Callable] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below bout the 2nd param
""" | ||
if subtypes is not None: | ||
if annotation_filter_function is not None: | ||
logger_warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would more likely raise an ValueError
@MrTomRod |
Sorry, I've lost the plot a bit, I'm very busy atm. What comment does "see below bout the 2nd param" refer to exactly? I think most of the suggestions make sense. Except if
Somehow destroys backwards compatibility. |
`> Sorry, I've lost the plot a bit, I'm very busy atm.
I meant
from my understanding @MartinThoma means, as your new parameter is exclusive from the old one, we should have 2 functions: but in order to keep code simple remove_annotation could call remove_filtered_annotations with a lambda function.
backwards compatibility would be kept 😀 |
I'm closing this PR now without merging. I have the impression that this is a rather unusual use case and that the API might be confusing. I would prefer a numpy-style API |
@MrTomRod Thank you for all the work you put into this PR. Although I didn't merge it, I do value it. If you want, I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-) |
No worries, and no need to add me to the list. Thanks for your effort! |
Changes:
_remove_annots_from_page
->remove_annots_from_page
subtypes: Optional[Iterable[str]]
parameter bydelete_decide_function: Optional[Callable] = None,
Closes #1829