-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating point exception when trying to OCR an image #3991
Comments
I should add that this also occurs with Tesseract 5.1.0 |
I ran the program through gdb and it seems to fail in libopenjp2 (I just upgraded to 2.5.0, my bug report mentions 2.4.0):
|
Running On Debian GNU Linux (x86_64) I can reproduce the crash. |
It works for me on windows too:
|
I now have run a test with latest code for openjpeg and leptonica. The crash occurs in If you build Tesseract with clang, only a FP division by zero raises an exception, so a clang build should not abort. It would also be possible to disable the code which enables FPU exceptions, either by removing it or by adding a condition which skips it. |
Then the code which enables FPU exceptions is not enabled for Windows builds. |
I just made a test with |
Thanks for figuring this out. I wonder if we should consider this to be a bug in openjp2 ultimately? I suppose it is also an option to only enable some of these exceptions for debug builds, but not for release builds, but I don't know if that makes any sense. |
Yes, I also think that is a bug in openjp2. Ignoring FP exceptions and calculating with NAN does not look like normal behaviour. I assume that you process a huge number of JPEG 2000 images every day. How many of those trigger this FPE? I enabled the exceptions in all builds because I want to know when the code does something unexpected which might result in unexpected OCR results. Debug builds won't find such cases because they are not used to process large quantities of different images. It would be possible to change that strategy and only report floating point exceptions without aborting, but I am afraid that people will simply ignore a warning message and don't report it as an issue here. |
Not many, I believe, but it's hard to know for sure. But there is a percentage of images where we continue with an empty page if Tesseract exits abnormally. Unrelated to this issue, but worth mentioning, is that we do log this. This is described here: https://archive.org/developers/ocr.html#adaptive-ocr and this search query will find any items where at least one page caused Tesseract to crash:
I personally think aborting makes sense here. It could perhaps be optionally disabled with a flag, but I think there's great value in finding bugs like these, so we can keep it on. One thing to note though is that |
That was my first test, and yes, that's because it does not enable any FP exceptions. |
[off-topic] In the cases that Tesseract prints 'Empty page!!', you can re-run Tesseract with Sauvola. |
Basic Information
Operating System
No response
Other Operating System
Gentoo Linux, but also Ubuntu 18.04, 20.04, etc
uname -a
Linux gentoo-x13 5.11.7-gentoo-dist #1 SMP Wed Mar 17 21:03:41 -00 2021 x86_64 AMD Ryzen 7 PRO 4750U with Radeon Graphics AuthenticAMD GNU/Linux
Compiler
GCC 11.2.1
Virtualization / Containers
No response
CPU
AMD Ryzen 7 PRO 4750U with Radeon Graphics
Current Behavior
Expected Behavior
It would expect Tesseract to not receive SIGFPE.
Suggested Fix
No response
Other Information
The problem doesn't occur if I decompress the JPEG2000 to a TIFF, so perhaps there is some problems with the JPEG2000 handling.
The image is here: https://archive.org/~merlijn/UNI_1918030101_0003.jp2
The text was updated successfully, but these errors were encountered: