Application which aims to visualise how OCR work (using tessaract OCR library and its HOCR output format).
- PHP 7.3+
- mySQL Database
- Tesseract Open Source OCR Engine - https://github.com/tesseract-ocr/tesseract
- ImageMagick
- PHP imagick module
Project is created with:
- Symfony 4
- Tesseract OCR for PHP - https://github.com/thiagoalessio/tesseract-ocr-for-php
- Imagick/ImageMagick
- PHPHtmlParser - https://github.com/paquettg/php-html-parser
- Displaying tessaractOCR bounding boxes on image (words, lines, paragraphs)
- Displaying recognised phrases over text on image
- Customisation of draw parameters - changing bounding box stroke color, font-size etc
yarn install
yarn encore dev
composer install
php bin/console doctrine:database:create
php bin/console doctrine:migrations:migrate