ocrevalUAtion

This set of classes provides basic support to perform the comparison of two text files: a reference file (a ground-truth document) and a the output from an OCR engine (a text file).

Options for specific behavior include: ignore case, ignore diacritics, ignore punctuation, ignore stop-words, Unicode and user-defined equivalences between characters.

It can be used with the graphic user interface (GUI) provided, in addition to command line interface usage.

Supported input formats include: plain text, FineReader 10 XML, PAGE XML, ALTO XML and hOCR HTML.

The output generates a report with statistics (including CER and WER error rates) and a table with the parallell input texts where the differences are highlighted.

A gentle introduction to OCR evaluation and to this tool can be found at https://sites.google.com/site/textdigitisation/

You can download the latest release from here.

Instructions on how to use ocrevalUAtion can be found in the wiki.

Name		Name	Last commit message	Last commit date
Latest commit History 444 Commits
A B C		A B C
api		api
notes		notes
src		src
testData		testData
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS		AUTHORS
Makefile		Makefile
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
es.6gr		es.6gr
fr.6gr		fr.6gr
fr.8gr		fr.8gr
fr.txt		fr.txt
git-graph.rb		git-graph.rb
hs_err_pid1383.log		hs_err_pid1383.log
pom.xml		pom.xml
sample_es.txt		sample_es.txt
sample_fr.txt		sample_fr.txt
userProperties.xml		userProperties.xml
userProperties_test.xml		userProperties_test.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocrevalUAtion

About

Releases

Packages

Languages

tesseract4java/ocrevalUAtion

Folders and files

Latest commit

History

Repository files navigation

ocrevalUAtion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages