Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Highlight text markup annotation #1740

Merged
merged 3 commits into from
Mar 26, 2023
Merged

Conversation

MartinThoma
Copy link
Member

See #107

@MartinThoma
Copy link
Member Author

MartinThoma commented Mar 23, 2023

I've got the quadpoints by using pymupdf and inspecting the resulting document:

import fitz

doc = fitz.open("crazyones.pdf")
page = doc[0]

text_instances = page.search_for("crazy")

for inst in text_instances:
    highlight = page.add_highlight_annot(inst)
    highlight.set_colors({"stroke":(0, 0, 1), "fill":(0.75, 0.8, 0.95)})
    highlight.update()

doc.save("annotation.pdf")

@pubpub-zz How difficult would a PageObject.search_for(text) method be to implement that returns something containing the QuadPoints? Some people would be curious: https://stackoverflow.com/q/47497309/562769 :-)

@codecov
Copy link

codecov bot commented Mar 23, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (4fc0040) 92.38% compared to head (b7bb307) 92.38%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1740   +/-   ##
=======================================
  Coverage   92.38%   92.38%           
=======================================
  Files          34       34           
  Lines        6553     6557    +4     
  Branches     1300     1301    +1     
=======================================
+ Hits         6054     6058    +4     
  Misses        326      326           
  Partials      173      173           
Impacted Files Coverage Δ
pypdf/generic/_annotations.py 93.91% <100.00%> (+0.21%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pubpub-zz
Copy link
Collaborator

@pubpub-zz How difficult would a PageObject.search_for(text) method be to implement that returns something containing the QuadPoints? Some people would be curious: https://stackoverflow.com/q/47497309/562769 :-)

It will be part of the extension of extract_text... I raise it in the stack

@pubpub-zz
Copy link
Collaborator

the inprogress PR #1723 about set_color should also allow to change the color : definitively set_color() should be the good name

@MartinThoma MartinThoma merged commit 3da3b25 into main Mar 26, 2023
@MartinThoma MartinThoma deleted the highlight-annotation branch March 26, 2023 10:19
MartinThoma added a commit that referenced this pull request Mar 26, 2023
Security (SEC):
-  Use Python's secrets module instead of random module (#1748)

New Features (ENH):
-  Add AnnotationBuilder.highlight text markup annotation (#1740)
-  Add AnnotationBuilder.popup (#1665)
-  Add AnnotationBuilder.polyline annotation support (#1726)
-  Add clone_from parameter in PdfWriter constructor (#1703)

Bug Fixes (BUG):
-  'DictionaryObject' object has no attribute 'indirect_reference' (#1729)

Robustness (ROB):
-  Handle params NullObject in decode_stream_data (#1738)

Documentation (DOC):
-  Project scope (#1743)

Maintenance (MAINT):
-  Add AnnotationFlag (#1746)
-  Add LazyDict.__str__ (#1727)

[Full Changelog](3.6.0...3.7.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-feature A feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants