-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
layout: Empty Page
output for default psm
#3670
Comments
Empty Page
output for default psm
Using |
You are right. I am finding the issue in about 1% of images generated by |
Here is a zip file with some images which have this problem. A few ok images are also included. EmptyPage.zip.zip |
Why is this message printed twice? |
Does this also happen with |
Yes, it is also happening with
The problem seems to be related to dpi.
Image is being recognized if I assign dpi 200 and 150. I tried to display the earlier messages regarding the dpi being used, but they seem to have been suppressed now , even with
|
In general, if you know in advance that the input is one line, then you should use |
The dpi in this image is in valid range (301) so tesseract will respect it and will not try to estimate it. That's why there is no warning. |
Your suggestion to make Tesseract do a second try can be improved by taking into account image height and number of blobs. For example, If the image height is below 60 pixels and has less than 100 blobs, Tesseract can try psm 6 and if it also fails it can then try psm 7. |
Using the API, you can give Tesseract an alternative config file and if recognition fails, Tesseract will do a second try using this config file. |
Lines 1268 to 1283 in b649222
|
I am trying to look for alternative ways to evaluate the recognition by different models since |
I don't know, you can try and see... |
Empty page output for complex newspaper pages is handled in issue #3021. |
For certain images the default psm gives
Empty Page
as output while--psm 6
and others give the correct result.Suggest that in cases where default psm results in
Empty Page
, try recognizing image with--psm 6
automatically along with a DEBUG message.Example image:
The text was updated successfully, but these errors were encountered: