This project requires the following libraries:
- Tensorflow (at least 0.10.0, but less than 1.0)
- PIL
- web.py
- werkzeug
- h5py
Top-level modules:
- flags.py: Various flags used throughout the system
- generate.py: Module used to generate synthetic data
- image.py: Various image-processing utility functions
- numberlocator.py: Module to train and test the number locator
- readnumber.py: Module to train the digit recognizer model; also contains the primary function locate_and_read_number
- server.py: Web application code
- synthetic_models.py: Library of some of the neural network configurations attempted during training on synthetic data
- svhn_models.py: Library of some of the neural network configurations attempted during training on SVHN data
The "inputs" directory contains modules that deal with pre-processing input data, as well as utility class for iterating over large datasets:
- inputs/datasource.py: Classes encapsulating generating input data for the classifiers
- inputs/batch.py: Classes for batch iterating over input data
- svhn.py: Functions for dealing with SVHN metadata
- sample_digit_data.py: Functions for dealing with synthetic data generated for digit classifiers
- sample_length_data.py: Functions for dealing with synthetic data generated for a stand-alone length classifier
The "models" directory contains classes representing various neural network configurations:
- models/base.py: Primary base class for all neural networks. Contains the core classifier code.
- models/multilayer.py: Multi-Layer Perceptron class
- models/convolution.py: CNN class. Contains variations for single and multi-logit variations
Other directories:
- "classifiers" contains the Tensorflow checkpoint and metadata files necessary for loading neural network parameters.
- "uploaded" contains all images uploaded to the web application.
- "templates" contain template files used by web.py to render the web application.
Input data:
- SVHN data can be downloaded from http://ufldl.stanford.edu/housenumbers/. Prior to be being used, metadata must extracted by running "svhn.py parse [train|test|extra]".
- Synthetic data can be generated by the "generate.py" module. Command is "generate.py [by_digit|by_length] {# of images per label}".
Training for the number locator can be kicked off with "numberlocator.py train". "numberlocator.py locate {imgfile}" will locate potential number bounding boxes on the provided image.
Digit recognizer training can be started with "readnumber.py --train --[synthetic|svhn] --[joint|digit {1-5}|length]"
- The "--joint" option will train a multi-logit network to output the sequence length and all digit positions at once
- The "--digit {1-5}" option will train the network for the specified digit position
- The "--length" otion will train the network on the digit sequence length
The entire algorithm can be invoked via command line using "readnumber.py --run {imgfile}"
Deploying the web application The application can be deployed by simply starting server.py. This launches a develoment server on 0.0.0.0:8080, though it does not perform well under high load. Alternatively, server.py can be run through any WSGI-compatible webserver. For example, it can be served with uWSGI by running "uwsgi --http :8080 --wsgi-file server.py"