Getting character level bounding boxes. #13880
Unanswered
Smit-Shukla
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm unable to find something that gets me character-level bounding boxes in the image/ page full of text.
My test cases are mix, some contain multiple long paragraphs with titles, subtitles and blanks, while some contain table layouts with short text inside it.
What I need is to match an input string inside the page/ image and highlight it.
For a reason, I need character-level bounding boxes, that give unique coordinates to each character in the page/ image.
I used tesseract with a custom config for this purpose, but the text extracted by tesseract is not that accurate, plus it doesn't handle spaces or punctuation that well in an image containing long texts and paras.
I can't do it using PaddleOCR as it gives bounding boxes of lines.
Since, Paddle OCR extracts the text from any kind of image and layout very accurately, I was wondering if there's anything that can give character bounding boxes, or if there's any work around for this task?
Beta Was this translation helpful? Give feedback.
All reactions