Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract 4.0 segfault when using dictionary ces and psm 7 #1154

Closed
nektor211 opened this issue Sep 24, 2017 · 7 comments
Closed

Tesseract 4.0 segfault when using dictionary ces and psm 7 #1154

nektor211 opened this issue Sep 24, 2017 · 7 comments

Comments

@nektor211
Copy link

Environment

  • Tesseract Version:
    tesseract 4.00.00alpha
    leptonica-1.74.4
    libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8

  • Commit Number:
    2a77d5a - found after bisecting from current master

  • Platform:
    Linux mc01 4.4.0-89-generic Warning in pixReadMemJpeg #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Current Behavior:

Segfault when running:

$tesseract bad.png - -l ces -psm 7
Warning. Invalid resolution 0 dpi. Using 70 instead.
contains_unichar_id(unichar_id):Error:Assert failed:in file ../ccutil/unicharset.h, line 513
Segmentation fault (core dumped)

Ces data were downloaded from https://github.com/tesseract-ocr/tessdata/raw/4.00/ces.traineddata .

bad

Expected Behavior:

$ tesseract bad.png - -l ces -psm 7
Warning. Invalid resolution 0 dpi. Using 70 instead.
nguzge gbopf/IUMNIC CZ spol. s r.o.

Suggested Fix:

@amitdo
Copy link
Collaborator

amitdo commented Sep 24, 2017

@nektor211
Copy link
Author

I cannot test it right now but I think that it worked ok when psm was not specified. I'll try to use the new data too, thanks.

@nektor211
Copy link
Author

I've checked and new traineddata solved this issue.
As for what psm modes, the segfault appeared for modes 7, 8, 11 and 12 (also for 14- but those are probably not valid anyway).
I'm not sure wheter to close this issue right away, but my problem is now solved. Thank You.

@Shreeshrii
Copy link
Collaborator

Another reason for #293

Please give the URL link of the files which worked and which ones didn't?

Which --oem are you using with it?

@nektor211
Copy link
Author

I've uploaded one that didn't work in the original post. One that works can be seen here:
1 . Otherwise it worked fine across a few thousand images, with only a single exception where it failed with the segfault mentioned above.

Oem was not set (as can be seen it examples above), so the default value.

@amitdo
Copy link
Collaborator

amitdo commented Oct 1, 2017

I suggest to close this issue since it was solved with an updated traineddata.

@IamDixit
Copy link

I had a similar issue.
Got it solved by taking a new training set from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants