Fully Convolutional Neural Network for Text Detection in Scene/Document with Script ID supports.
This repo contains two types of deep neural networks,
-
pixel-level Text Detection Classification Network (TDCN) with specializations in
- Word-Level
SceneText
, e.g. a street-view image. - Line-Level
DocumentText
, e.g. a scanned letter.
It classifies each pixel in an input image into one for the following three categories:
class index class name color channel Description 0 NonText Red Any non-text content 1 Border Green Pixels on text borders 2 Text Blue Text Pixels - Word-Level
-
page-level Script ID Classification Network (SICN)
It classifies an input image into one of the following categories:
scriptID index scriptID name Country 0 NonText N/A 1 Latin US, UK, etc. 2 Hebrew Israel 3 Cyrillic Russia, Ukraine, etc. 4 Arabic Iran, Saudi Arabia, etc. 5 Chinese China, HongKong, etc. 6 TextButUnknown N/A
where TDCN models are mainly based on the ICCV17 paper Self-organized Text Detection with Minimal Post-processing via Border Learning.
All models are trained with the Keras
deep nerual network library with the TensorFlow
backend.
This repo contains the following data
data
: sample image data for testing.lib
: core model definitions and util functions.model
: pretrained model weights.textDetSceneModel.h5
: pretrained TDCN weight for scene text detectiontextDetDocumModel.h5
: pretrained TDCN weight for document text detectionscriptIDModel.h5
: pretrained SICN weight
notebook
: Python2 demo notebook.bin
: Python2/3 command-line tool.
The python code in this repo is compatible with both Python2.7x and Python3. It depends on the core deep learning libraries
Keras
: >=2.0.7TensorFlow
: >=1.1.0
and image processing libraries.
OpenCV-Python
: >=3.1.0Skimage
: >= 0.13.0
# 1. TDCN
textDet_model = textDetCore.create_textDet_model()
textDet_model.load_weights( textDet_weight )
# 2. SICN
scriptID_model = textDetCore.create_scriptID_model()
scriptID_model.load_weights( scriptID_weight )
Simple decoder is written w.r.t. the use case of text detection for a document image or a text input image where text regions can be expressed as a set of rectangular bounding boxes.
def simple_decoder( file_path,
textDet_model,
scriptID_model=None,
output_dir=None,
dom_font=None,
dark_text=None,
rotated_text=False,
proba_threshold=.5,
lh_threshold=15,
contrast_threshold=24.,
return_proba=True,
n_jobs=1,
verbose=2) :
"""
INPUTS:
------------------------------------------------------------------------
| Mandentory Parameters
------------------------------------------------------------------------
file_path = str, path to a local file or URL to a web image
textDet_model = keras model, pretrained text detection model
scriptID_model = None or keras model, pretrained script ID model
if None, then no scriptID classification
output_dir = None or str or 'SKIP', dir to save detected and corrected text regions,
if None, then text regions as JPEG buffers
if SKIP, then not save text regions
------------------------------------------------------------------------
| Prior Knowledge Parameters (better performance if they are provided)
------------------------------------------------------------------------
dom_font = int or None, dominant fontsize height in terms of pixels
if None, then apply automatic estimation
dark_text = bool or None, whether texts on image is dark/black or not
if None, then apply automatic estimation
rotated_text = bool, whether text regions are rotated or not
if False, then faster decoding is applied
------------------------------------------------------------------------
| Simple Rules to Reject A Text Region
------------------------------------------------------------------------
proba_threshold = float in (0,1), the minimum text probability to accept a text region
lh_threshold = float, the minimum line height to accept a text region
contrast_threshold = float in (0,255), the minimum intensity standard deviation to accept a text region
------------------------------------------------------------------------
| Others
------------------------------------------------------------------------
return_proba = bool, if true, return the raw outputs of both models
n_jobs = int, if greater than 1, use multiple CPUs
verbose = bool, if true, print out state messages
OUTPUTS:
output_lut = dict, containing all decoded results including
'filename' -> input image file
'resize' -> resize factor for text detection analysis
'md5' -> image md5 tag
'Pr(XXX)' -> script ID probability of a known scriptID class XXX
'bboxes' -> list of bounding box dictionaries, where each element is a dict of
'cntx' -> bbox's x coordinates
'cnty' -> bbox's y coordinates
'proba' -> text probility of this region
'area' -> bbox area
'contrast' -> bbox contrast
'imgfile' -> file path the dumped text region image, when output_dir is given
'jpgbuf' -> jpeg buffer for the text region image, when output_dir is None
proba_map = ( text_proba, script_proba )
- text_proba, i.e. a text probability map, size of imgHeight-by-imgWidth-by-3
- script proba, i.e. a script ID probability map, size of 1-by-7
"""
This is a more complicated decoder for scene text detection. This extra complexity is due to the fact that the dominant fontsize assumption may no longer hold for a scene text image, and we can't guess the actual fontsize height based on the input image size. We, therefore, analyze a given scene text image at a number of resolution scales to capature text regions of very different fontsizes.
def lazy_decoder( file_path,
textDet_model,
scriptID_model=None,
num_resolutions=5,
output_dir=None,
proba_threshold=.33,
lh_threshold=8,
contrast_threshold=32.,
return_proba=True,
n_jobs=1,
verbose=2) :
"""
INPUTS:
------------------------------------------------------------------------
| Mandentory Parameters
------------------------------------------------------------------------
file_path = str, path to a local file or URL to a web image
textDet_model = keras model, pretrained text detection model
scriptID_model = None or keras model, pretrained script ID model
if None, then no scriptID classification
output_dir = None or str or 'SKIP', dir to save detected and corrected text regions,
if None, then text regions as JPEG buffers
if SKIP, then not save text regions
------------------------------------------------------------------------
| Simple Rules to Reject A Text Region
------------------------------------------------------------------------
proba_threshold = float in (0,1), the minimum text probability to accept a text region
lh_threshold = float, the minimum line height to accept a text region
contrast_threshold = float in (0,255), the minimum intensity standard deviation to accept a text region
------------------------------------------------------------------------
| Others
------------------------------------------------------------------------
return_proba = bool, if true, return the raw outputs of both models
n_jobs = int, if greater than 1, use multiple CPUs
num_resolutions = int, default 3
verbose = bool, if true, print out state messages
OUTPUTS:
output_lut = dict, containing all decoded results including
'filename' -> input image file
'resize' -> resize factor for text detection analysis
'md5' -> image md5 tag
'Pr(XXX)' -> script ID probability of a known scriptID class XXX
'bboxes' -> list of bounding box dictionaries, where each element is a dict of
'cntx' -> bbox's x coordinates
'cnty' -> bbox's y coordinates
'proba' -> text probility of this region
'area' -> bbox area
'contrast' -> bbox contrast
'imgfile' -> file path the dumped text region image, when output_dir is given
'jpgbuf' -> jpeg bufferfor the text region image, when output_dir is None
proba_map = ( text_proba, script_proba )
- text_proba, i.e. a text probability map, size of imgHeight-by-imgWidth-by-3
- script proba, i.e. a script ID probability map, size of 1-by-7
"""
Finally, we also provide a command-line tool for text detection.
python bin/textDetection.py -h
usage: textDetection.py [-h] [-i INPUT_FILES] [-o OUTPUT_DIR]
[-t {full,textDet,postProc}]
[-m {doc0,doc1,scene,custom}] [-v VERBOSE]
[-tl TH_LINEHEIGHT] [-tp TH_TEXTPROB]
[-tc TH_CONTRAST] [-mt {line,word}]
[-dt {simple,lazy}] [-nj N_JOBS] [-nr N_RES]
[--domFont DOM_FONT] [--darkText] [--rotText]
[--jpegBuffer] [--version]
Text Detection with ScriptID supports
optional arguments:
-h, --help show this help message and exit
required arguments:
-i INPUT_FILES, --inputFile INPUT_FILES
input test image files or cached json files
-o OUTPUT_DIR, --outputDir OUTPUT_DIR
output detection dir (./)
-t {full,textDet,postProc}, --task {full,textDet,postProc}
tasks in {full, textDet, postProc}
optional arguments:
-m {doc0,doc1,scene,custom}, --mode {doc0,doc1,scene,custom}
working mode in {doc0(black and horizontal text),
doc1(black text), scene, custom}
-v VERBOSE verbosity level (0), higher means more print outs
-tl TH_LINEHEIGHT, --threshLineHeight TH_LINEHEIGHT
minimal text region height (15)
-tp TH_TEXTPROB, --threshTextProba TH_TEXTPROB
minimal text region probability (.5)
-tc TH_CONTRAST, --threshContrast TH_CONTRAST
minimal text region contrast (16)
-mt {line,word}, --modelType {line,word}
detector type in {line, word}
-dt {simple,lazy}, --decoderType {simple,lazy}
decoder type in {simple, lazy}
-nj N_JOBS, --nJobs N_JOBS
number of parallel cpu jobs
-nr N_RES, --nRes N_RES
number of analysis resultions
--domFont DOM_FONT dominant text height in pixels
--darkText whether or not texts are darker
--rotText whether or not texts are rotated
--jpegBuffer whether or not save image inside json
--version show program's version number and exit
Below is sample testing code that you may want to try
# make a tmp dir to host data
mkdir tmp && cd tmp
# find some testing data
find ../data/ -name "*_*.jpg" > test.list
# run text detection and save text images using the doc0 mode
python ../bin/textDetection.py -i test.list -o /tmp/result_detOnly -t full -m doc0 -v 2
You may find ipython2 notebook under notebook
. Alternatively, you are welcome to use our provided google colab notebooks (open and clone your own)
- Python2 Notebook
- python3 Notebook
If you use this repo for academic purposes, please cite the following paper.
@INPROCEEDINGS{WuICCV2017,
author={Y. Wu and P. Natarajan},
booktitle={2017 IEEE International Conference on Computer Vision (ICCV)},
title={Self-Organized Text Detection with Minimal Post-processing via Border Learning},
year={2017},
pages={5010-5019}, doi={10.1109/ICCV.2017.535},
month={Oct},}
For questions, please contact Dr. Yue Wu (yue_wu@isi.edu
).
The Software is made available for academic or non-commercial purposes only. The license is for a copy of the program for an unlimited term. Individuals requesting a license for commercial use must pay for a commercial license.
USC Stevens Institute for Innovation
University of Southern California
1150 S. Olive Street, Suite 2300
Los Angeles, CA 90115, USA
ATTN: Accounting
DISCLAIMER. USC MAKES NO EXPRESS OR IMPLIED WARRANTIES, EITHER IN FACT OR BY OPERATION OF LAW, BY STATUTE OR OTHERWISE, AND USC SPECIFICALLY AND EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, VALIDITY OF THE SOFTWARE OR ANY OTHER INTELLECTUAL PROPERTY RIGHTS OR NON-INFRINGEMENT OF THE INTELLECTUAL PROPERTY OR OTHER RIGHTS OF ANY THIRD PARTY. SOFTWARE IS MADE AVAILABLE AS-IS. LIMITATION OF LIABILITY. TO THE MAXIMUM EXTENT PERMITTED BY LAW, IN NO EVENT WILL USC BE LIABLE TO ANY USER OF THIS CODE FOR ANY INCIDENTAL, CONSEQUENTIAL, EXEMPLARY OR PUNITIVE DAMAGES OF ANY KIND, LOST GOODWILL, LOST PROFITS, LOST BUSINESS AND/OR ANY INDIRECT ECONOMIC DAMAGES WHATSOEVER, REGARDLESS OF WHETHER SUCH DAMAGES ARISE FROM CLAIMS BASED UPON CONTRACT, NEGLIGENCE, TORT (INCLUDING STRICT LIABILITY OR OTHER LEGAL THEORY), A BREACH OF ANY WARRANTY OR TERM OF THIS AGREEMENT, AND REGARDLESS OF WHETHER USC WAS ADVISED OR HAD REASON TO KNOW OF THE POSSIBILITY OF INCURRING SUCH DAMAGES IN ADVANCE.
For commercial license pricing and annual commercial update and support pricing, please contact:
Rakesh Pandit USC Stevens Institute for Innovation
University of Southern California
1150 S. Olive Street, Suite 2300
Los Angeles, CA 90115, USA
Tel: +1 213-821-3552
Fax: +1 213-821-5001
Email: rakeshvp@usc.edu and ccto: accounting@stevens.usc.edu