How to extract text bounding box coordinates? #204

hoangthanh283 · 2023-04-11T14:41:38Z

Thanks for your work!
I tried to look at the document to find APIs to extract text bounding boxes but I could not. So I wonder do we support extracting text bounding boxes or not.

I tried with:

searcher = textpage.search("something", match_case=False, match_whole_word=False)
first_occurrence = searcher.get_next()

But it returns a tuple (int, int) (Start character index and count of the next occurrence) instead of a list of bounding boxes of the form (left, bottom, right, top) that is mentioned in README.

The text was updated successfully, but these errors were encountered:

mara004 · 2023-04-11T19:42:46Z

So I wonder do we support extracting text bounding boxes or not.

Yes we do.

You'll want the PdfTextPage API, notably count_rects() and get_rect().
Sorry about the outdated readme comment, that API changed with v4. I'll fix that.

I guess you only looked at the readme and missed the docs on RTD, right?

mara004 added question A user needs help or further information conversation labels Apr 11, 2023

mara004 self-assigned this Apr 11, 2023

mara004 closed this as completed Apr 11, 2023

mara004 added a commit that referenced this issue Apr 11, 2023

Correct a readme example (CC #204)

5e8271b

samshelley mentioned this issue Jun 19, 2023

Need coordinate conversion help #228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to extract text bounding box coordinates? #204

How to extract text bounding box coordinates? #204

hoangthanh283 commented Apr 11, 2023 •

edited

Loading

mara004 commented Apr 11, 2023 •

edited

Loading

How to extract text bounding box coordinates? #204

How to extract text bounding box coordinates? #204

Comments

hoangthanh283 commented Apr 11, 2023 • edited Loading

mara004 commented Apr 11, 2023 • edited Loading

hoangthanh283 commented Apr 11, 2023 •

edited

Loading

mara004 commented Apr 11, 2023 •

edited

Loading