-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No text lines in ALTO output #13
Comments
Hi, what format is the input file and what does it contain? |
The input is the PAGE XML file https://digi.bib.uni-mannheim.de/~stweil/FILE_0063_OCR-D-OCR-TESSEROCR.xml (link was also given above). It contains the layout information and OCR results for a single page from one of our books. |
Okay, that explains it. Words (STRINGs) are mandatory in ALTO. So because there are no words in the PAGE XML, the ALTO exporter cannot add the text lines. |
@chris1010010, thank you for your explanation. |
@stweil No. You can set |
For anyone who traps into this fallacy from another angle (besides the special case ocrd-tesserocr-recognize): |
The conversion of an example file from PAGE to ALTO creates an ALTO file without text lines:
Is ALTO support still incomplete?
The text was updated successfully, but these errors were encountered: