Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

working with uzn file not working #66

Closed
michi729 opened this issue Jan 24, 2014 · 9 comments
Closed

working with uzn file not working #66

michi729 opened this issue Jan 24, 2014 · 9 comments
Assignees
Labels

Comments

@michi729
Copy link

in a command line you would use "tesseract.exe pic1.bmp pic1.txt -psm 4" and put a pic1.uzn file in the current directory.
When I try
Tesseract.TesseractEngine tesseract = new Tesseract.TesseractEngine("....path... tessdata", "eng", Tesseract.EngineMode.Default);
Tesseract.Pix picture = Tesseract.Pix.LoadFromFile(@"...path... pic1.bmp");
Tesseract.Page page = tesseract.Process(picture, Tesseract.PageSegMode.SingleColumn); //PSM -4
...
string text = page.GetText();

will lead to an exception on GetText (same as tesseract.exe would fail if there is no uzn file)
Therefore I assume that the .net wrapper does not find (or search for) the uzn file.

Could you please tell me what to do or if this is a bug?

@charlesw
Copy link
Owner

Hi michi729,
I wasn't even aware of uzn files before now which is probably why it doesn't work. Anyway I've done a little reading and it seems like tesseract needs to know the input name for file (which make sense since this is how it finds the uzn file). Do you think it makes sense to add this to the Page class, in which case you could do:

using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default)) {
    using(var img = Pix.LoadFromFile("./phototest.tif")) {
        using(var page = engine.Process(img)) {
            page.InputName = "phototest.tif";
            // do processing
        }
    }
}

Alternatively I could overload the Process method so it takes the input name as an optional parameter. Do you have any preferences? Also if you could kindly provide an example image and corresponding uzn file with a brief description of what you expect the output should be so I can write up a test case to verify the implementation. Note this should of course not contain any confidential information or be copyrighted.

@ghost ghost assigned charlesw Jan 24, 2014
@michi729
Copy link
Author

Hi Charles, thanks for the quick response!
I will get back to you with an example picture as well as uzn file in time.
All the best, Michael

@michi729
Copy link
Author

test

Calling "tesseract.exe test.png test -psm 4"
with tesseract, test.png and test.uzn in the same directory will result in a test.txt with the content
This is another test

Content of test.uzn:
100 130 200 30 Text

@michi729 michi729 reopened this Jan 24, 2014
@charlesw
Copy link
Owner

Thanks just what I needed.

@michi729
Copy link
Author

Hi Charles, I am not sure, if this should be added as parameter. Tesseract itself just replaces the suffix of the current picure's name. I.e. you could get the picture name from parsing LoadFromFile. What do you think?

@charlesw
Copy link
Owner

In theory yes, however this would only work if the image was loaded from file. Tesseract actually doesn't work this way and according to my analysis of the source relies on the image name being passed in as an additional parameter to it's ProcessPage routine. Its a pretty simple fix really so should have it done tomorrow sometime, assuming no unforeseen issues arise.

@michi729
Copy link
Author

You are right :-) And thanks for taking the time!

@charlesw
Copy link
Owner

Just released an updated nuget package (1.10) that supports uzn files though an optional parameter on Process as previously discussed. Please note that using a PSM of SingleColumn (4) does NOT work due to a bug in Tesseract 3.02 (https://code.google.com/p/tesseract-ocr/issues/detail?id=653) however other options do. This issue will be resolved once tesseract 3.03 has been released.

@michi729
Copy link
Author

Hi Charles, thank you very much for your fast support :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants