fontina
is a PyTorch library that helps with training models for the task of Visual Font
Recognition and doing inference with them.
- DeepFont-like network architecture. See Z. Wang, J. Yang, H. Jin, E. Shechtman, A. Agarwala, J. Brandt and T. Huang, “DeepFont: Identify Your Font from An Image”, In Proceedings of ACM International Conference on Multimedia (ACM MM) , 2015
- Configuration-based synthetic dataset generation
- Configuration-based model training via PyTorch Lightning
- Supports training and inference on Linux, MacOS and Windows.
Starting from a cloned repository directory:
# Create a virtual environment: this uses venv but any system
# would work!
python -m venv .venv
# Activate the virtual environment: this depends on the OS. See
# the two options below.
# .venv/Scripts/activate # Windows
source .venv/bin/activate # Linux
# Install the dependencies needed to use .
pip install .
Note Windows users must manually install the CUDA-based version of PyTorch, as pip will only install the CPU version on this platform. See PyTorch Get Started for the specific command to ru, which should be something along the lines of
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117
.
The following dependencies are only needed to develop fontina
.
# Install the developer dependencies.
pip install .[linting]
# Run linting and tests!
make lint
make test
If needed, the model can be trained on synthetic data. fontina
provides a synthetic
dataset generator that follows part of the recommendations from the DeepFont paper
to make the synthetic data look closer to the real data. To use the generator:
- Make a copy of
configs/sample.yaml
, e.g.configs/mymodel.yaml
- Open
configs/mymodel.yaml
and tweak thefonts
section:
fonts:
# ...
# Force an uniform white background for the generated images.
# backgrounds_path: "assets/backgrounds"
samples_per_font: 1000
# Fill in the paths of the fonts that need to be used to generate
# the data.
classes:
- name: Test Font
path: "assets/fonts/test/Test.ttf"
- name: Other Test Font
path: "assets/fonts/test2/Test2.ttf"
- Run the generation:
fontina-generate -c configs/mymodel.yaml -o outputs/font-images/mymodel
After this completes, there should be one directory per configured font in outputs/font-images/mymodel
.
fontina
currently only supports training a DeepFont-like architecture. The training process
has two major steps: unsupervised training of the stacked autoencoders and supervised training
of the full network.
Before starting, make a copy of configs/sample.yaml
, e.g. configs/mymodel.yaml
(or use the
existing one that was created for the dataset generation step).
- Open
configs/mymodel.yaml
and tweak thetraining
section:
training:
# Set this to True to train the stacked autoencoders.
only_autoencoder: True
# Don't use an existing checkpoint for the unsupervised training.
# scae_checkpoint_file: "outputs/models/autoenc/good-checkpoint.ckpt"
data_root: "outputs/font-images/mymodel"
# The directory that will contain the model checkpoints.
output_dir: "outputs/models/mymodel-scae"
# The size of the batch to use for training.
batch_size: 128
# The initial learning rate to use for training.
learning_rate: 0.01
epochs: 20
# Whether or not to use a fraction of the data to run a
# test cycle on the trained model.
run_test_cycle: True
- Then run the training with:
python src/fontina/train.py -c configs/mymodel.yaml
- Open
configs/mymodel.yaml
(or create a new one!) and tweak thetraining
section:
training:
# This will freeze the stacked autoencoders.
only_autoencoder: False
# Pick the best checkpoint from the unsupervised training.
scae_checkpoint_file: "outputs/models/mymodel-scae/good-checkpoint.ckpt"
data_root: "outputs/font-images/mymodel"
# The directory that will contain the model checkpoints.
output_dir: "outputs/models/mymodel-full"
# The size of the batch to use for training.
batch_size: 128
# The initial learning rate to use for training.
learning_rate: 0.01
epochs: 20
# Whether or not to use a fraction of the data to run a
# test cycle on the trained model.
run_test_cycle: True
- Then run the training with:
fontina-train -c configs/mymodel.yaml
fontina
automatically captures the performances of the training runs in a TensorBoard-compatible
way. It should be possible to visualize the recorded data by pointing TensorBoard to the logs
directory as follows:
tensorboard --logdir=lightning_logs
Once training is complete, the resulting model can be used to run inference.
fontina-predict -n 6 -w "outputs/models/mymodel-full/best_checkpoint.ckpt" -i "assets/images/test.png"
The AdobeVFR dataset is currently available for download at Dropbox, here. The license for using and distributing the dataset is available here, which cites:
This dataset ('Licensed Material') is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications or personal experimentation.
The model, being trained on that dataset, retains the same spirit and the same license applies: the release model can only be used for non-commercial purposes.
- Download the dataset to
assets/AdobeVFR
- Unpack
assets/AdobeVFR/Raw Image/VFR_real_u/scrape-wtf-new.zip
in that directory so that theassets/AdobeVFR/Raw Image/VFR_real_u/scrape-wtf-new/
path exists - Run
fontina-train -c configs/adobe-vfr-autoencoder.yaml
. This will take a long while but progress can be checked with Tensorboard (see the previous sections) during training - Change
configs/adobe-vfr.yaml
so thatscae_checkpoint_file
points to the best checkpoint from step (3). - Run
fontina-train -c configs/adobe-vfr.yaml
. This will take a long while (but less than the unsupervised training round)
While only the full model is needed, the stand-alone autonencoder model is being released as well.
- Stand-alone autoencoder model: Google Drive
- Full model: Google Drive
Note The pre-trained model achieves a validation loss of 0.3523, with an accuracy of 0.8855 after 14 epochs. Unfortunately the test performance on
VFR_real_test
is much worse, with a top-1 accuracy of 0.05. I'm releasing the model in the hope that somebody could help me fixing this 😊😅