Skip to content

Application which aims to visualise how tesseractOCR work

Notifications You must be signed in to change notification settings

risendy/tesseractOCRVisualiser

Repository files navigation

Table of contents

General info

Application which aims to visualise how OCR work (using tessaract OCR library and its HOCR output format).

Requirements

Technologies

Project is created with:

Features

  • Displaying tessaractOCR bounding boxes on image (words, lines, paragraphs)
  • Displaying recognised phrases over text on image
  • Customisation of draw parameters - changing bounding box stroke color, font-size etc

Installation

Clone the repository

Install front-end dependencies

yarn install

Compile assets

yarn encore dev

Run composer

composer install
Create database
 php bin/console doctrine:database:create

Run migrations

php bin/console doctrine:migrations:migrate

Screenshots

Homepage

Main page

Image with all bounding boxes

Single news page

Part of the image with recognised text

Comments section