-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows Build 577% Slower than Linux Build #1307
Comments
Could you please repeat your test with environment variable I expect the difference will be much smaller then. Windows multithreading is not performing very good. For a single threaded Tesseract there should be nearly no difference because the code was generated by the same kind of compiler (gcc) in both cases. Which version is |
Gave it a try with OMP_THREAD_LIMIT=1, also then added OMP_NUM_THREADS = 1. tesseract-ocr-setup-4.0.0-alpha.20180109.exe is the version in use. I also seem to get this error at the end of OCRing with or without those environment variables, doubt its related but... |
For further investigation more information is needed. Could you provide your test image somewhere? Which |
Here is the command: Unfortunately I cant provide the exact images from this example as they contain personal details. But they are 12 PNGs all 2480x3508 pixels @ 72ppi & 8 bit depth. They are PNGs generated by ImageMagick from a PDF. The performance issue is experienced on all PNGs however (converted from any other format), not just this document. I am using the original tessdata provided with tesseract-ocr-setup-4.0.0-alpha.20180109.exe |
Thanks for that information. "Original tessdata" means that you are using Meanwhile there exist better models for Tesseract 4: get |
BTW, it is worth to compare with MSVC builds. |
MSVC has a good reputation regarding code quality and might have a better implementation of OpenMP than gcc for Windows. As I said before, MinGW-w64 (and therefore also the UB Mannheim executables) uses gcc, so that's the same binary code for central parts (like dot product) as the Linux code. Therefore there should be only a small difference for single threaded Tesseract. |
@zdenop Please label Performance |
In order to add jp2 lib, I just built both leptonica and tesseract using cmake with default options. I find the OCR with this is much much slower than the version I had built with autotools/make. This may have to do with the fact that with autotools, while running configure I had disabled openmp, opencl and graphics. How to disable these three when building using cmake? |
@egorpugin How to disable openmp, opencl and graphics while building tesseract with cmake for running on linux? Since I built leptonica with it, I have to use same for tesseract (otherwise there are libraryname issues). |
The best way for now is to remove those options from |
I don't think removing options is good idea.
Please have in mind end users, who don't want/can't compile tesseract by them-self. I think If option has no huge side effect or could be easily turn off, it should be compiled. |
@leemorton, could you please repeat your performance test with the latest 64 bit installer? I assume that you used 32 bit Tesseract on Windows and 64 bit Tesseract on Linux, so that might explain some performance differences. |
That bug was recently fixed. |
I close this issue as there was no recent activity and recent code does not show large differences for the performance on Linux and Windows. |
Environment
Current Behavior:
Identical machine spec with identical workload and tesseract configuration results in consistent 577% slower performance on Windows 10 x64 compared with Debian Stretch x64. Essentially the job takes averagely 18 seconds on the Linux build, and 1 minute 44 seconds on the win build. Has been tested on other machines and fresh installations.
Expected Behavior:
Significantly less than 577% difference in performance.
What could be causing the win build to experience that level of overhead...?
The text was updated successfully, but these errors were encountered: