Getting character level bounding boxes. #13880

Smit-Shukla · 2024-09-17T17:36:23Z

Smit-Shukla
Sep 17, 2024

I'm unable to find something that gets me character-level bounding boxes in the image/ page full of text.

My test cases are mix, some contain multiple long paragraphs with titles, subtitles and blanks, while some contain table layouts with short text inside it.

What I need is to match an input string inside the page/ image and highlight it.

For a reason, I need character-level bounding boxes, that give unique coordinates to each character in the page/ image.

I used tesseract with a custom config for this purpose, but the text extracted by tesseract is not that accurate, plus it doesn't handle spaces or punctuation that well in an image containing long texts and paras.

I can't do it using PaddleOCR as it gives bounding boxes of lines.

Since, Paddle OCR extracts the text from any kind of image and layout very accurately, I was wondering if there's anything that can give character bounding boxes, or if there's any work around for this task?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting character level bounding boxes. #13880

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Getting character level bounding boxes. #13880

Smit-Shukla Sep 17, 2024

Replies: 0 comments

Smit-Shukla
Sep 17, 2024