Chamanti OCR చామంతి

Mission

This project aims to build an ambitious state-of-the-art OCR framework, that should work on any language. It will not rely on segmentation algorithms (at the glyph level), making it ideal for highly agglutinative scripts like Arabic, Devanagari etc. We will be starting with Telugu however. We use the technology of Convolutional Recurrent Neural Networks from Keras in TensorFlow 2.0. CRNN with CTC (Connectionist Temporal Classification) loss function is the main work-horse.

Dependencies

tensorflow
Lekhaka - My 'scribing' package for generating complex text, including Indian languages like Telugu, on the fly

Setup

Install TensorFlow
Download Lekhaka and place in a parallel dicrectory

Files

model_builder.py The TensorFlow CRNN model with CTC loss
train.py Main file to run
utils.py, post_process.py Utilities to print images and Probabilities to terminal, etc.

Training the CRNN

You can now train a CRNN to read Telugu text!

python3 train.py spec 1
python3 train.py banti banti_trained_instance.pkl
python3 train.py chamanti chamanti_trained_instance.pkl

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
lab		lab
README.md		README.md
banti2chamanti.py		banti2chamanti.py
default_args.py		default_args.py
model_builder.py		model_builder.py
model_specs.py		model_specs.py
post_process.py		post_process.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chamanti OCR చామంతి

Mission

Dependencies

Setup

Files

Training the CRNN

About

Releases

Packages

Languages

rakeshvar/chamanti_ocr

Folders and files

Latest commit

History

Repository files navigation

Chamanti OCR చామంతి

Mission

Dependencies

Setup

Files

Training the CRNN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages