hOCR-Proofreader

This is Niko Partanen's fork of hocr-proofreader. After comparing various editors out there, this seems like one that functions the best and is most maintainable, and further experimentation is now carried out to see how well it suits to proofreading some books on Uralic languages spoken in Russia. Since the books are rather large, the setup will be customized so that reading one book will work nicely.

Note: The editor is set up to work with hOCR files that contain multiple pages. One way to get those out from Tesseract is to run it on a text file that had paths to individual pages:

tesseract pagelist.txt book -l kpv hocr

I assume multi-page TIFF would return similar file.

The way I'm currently testing it looks like this:

Save-button just saves the edited XML into a file.

hOCR-Proofreader

Web based JavaScript GUI library for proofreading/editing hOCR.

Features:

Two view concept: Original layout vs. hOCR text – linked together (i.e. hovering words etc. on both sides)
Original layout can be switched between the original image and the text rendered from hOCR at the same positions – really powerful to find OCR errors
Pure JavaScript without dependencies just using current browser features
Embeddable in other projects

Online-Demo: http://www.not-implemented.de/hocr-proofreader/

TODO

Full editor features (currently it's just a "contentEditable = true") ... there is a lot of work to do
Handling bounding-boxes on word/line/paragraph merge/split correctly
...

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
LICENSE		LICENSE
README.md		README.md
editor.css		editor.css
fonts.css		fonts.css
hocr-proofreader.css		hocr-proofreader.css
hocr-proofreader.js		hocr-proofreader.js
index.html		index.html
main.css		main.css
main.js		main.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hOCR-Proofreader

TODO

About

Releases

Packages

Languages

License

langdoc/hocr-proofreader

Folders and files

Latest commit

History

Repository files navigation

hOCR-Proofreader

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages