hocr-parser

Python parser for hOCR files using lxml

hOCR is an open standard for representing the results of optical character recognition (OCR). The results of OCR (the recognized text, layout, styles, etc.) are represented in hOCR using XHTML. This Python module parses an existing hOCR file and gives easy access to the hOCR elements and their attributes.

Install

Python 3.6+ is required, and you'll probably want to use some kind of virtual environment to install this package. Until I push the package to PyPi, you can install directly from Github with pip:

pip install git+https://github.com/jlieth/hocr-parser

Similar projects

hocr-parser by Athento, and its forks. Uses BeautifulSoup for parsing
hocr-tools by OCRopus. Not a parser exactly, but has several tools to work with hOCR files
hocr-spec-python by Konstantin Baierer, editor of the hOCR spec. hOCR validator written in Python.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
hocr_parser		hocr_parser
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hocr-parser

Install

Similar projects

External links

About

Releases

Contributors 2

Languages

jlieth/hocr-parser

Folders and files

Latest commit

History

Repository files navigation

hocr-parser

Install

Similar projects

External links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Contributors 2

Languages