Skip to content

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

License

Notifications You must be signed in to change notification settings

JPLeoRX/detectron2-publaynet

Repository files navigation

Warning!!!

This repository is no longer maintained, as it was moved to another GitHub. Please see CaseDrive GitHub profile for any updates related to this research.


Here I present Detectron2 object detection models trained on PubLayNet dataset, ranging from 81.139 to 86.690 in validation AP scores (possibly even better results can be achieved with longer training times).

Preview

Dataset overview

PubLayNet is a very large (over 300k images & over 90 GB in weight) dataset for document layout analysis. It contains images of research papers and articles and annotations for various elements in a page such as “text”, “list”, “figure” etc in these research paper images. The dataset was obtained by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. Originally provided by IBM here.

Data Description Zipped File Name Purpose
Train 0 Dataset, 13 GB train-0.tar.gz Training
Train 1 Dataset, 13 GB train-1.tar.gz Training
Train 2 Dataset, 13 GB train-2.tar.gz Training
Train 3 Dataset, 13 GB train-3.tar.gz Training
Train 4 Dataset, 13 GB train-4.tar.gz Training
Train 5 Dataset, 13 GB train-5.tar.gz Training
Train 6 Dataset, 13 GB train-6.tar.gz Training
Evaluation Dataset, 3 GB val.tar.gz Evaluation
Labels Dataset, 314 MB labels.tar.gz

Models overview

Models were trained on train part of the dataset, consisting of 335 703 images, and evaluated on val part of the dataset with 11 245 images. All the AP scores were obtained on the val dataset. Inference times were taken from official Detectron model zoo descriptions.

To provide you with some options, and to experiment with different models I have trained 4 versions for this dataset. 2 of them are from basic object detection, and 2 contain segmentation masks. You can choose which better suits your task by comparing accuracy scores and inference times of these models.

Name Inference time (s/im) Box AP Folder name Model zoo config Trained model
R50-FPN 0.038 81.139 faster_rcnn_R_50_FPN_3x Click me! Click me!
R101-FPN 0.051 84.295 faster_rcnn_R_101_FPN_3x Click me! Click me!
Name Inference time (s/im) Box AP Mask AP Folder name Model zoo config Trained model
R50-FPN 0.043 83.666 82.268 mask_rcnn_R_50_FPN_3x Click me! Click me!
R101-FPN 0.056 86.690 82.105 mask_rcnn_R_101_FPN_3x Click me! Click me!

Model folder

Each model's directory in git will contain these files:

File name Description
download.txt A text file that contains a link from where you can download the trained model
train.py Training script, that trains the model found in training_output sub-folder for a given number of epochs
test.py Testing script, that runs the model found in training_output in inference mode for 6 randomly preselected images (3 from training, 3 from evaluation datasets) and displays all predictions on each image in a pop-up window
eval.py Testing script, that runs the model found in training_output in inference mode for all images found in dataset's val folder and val.json. Evaluation is performed through Detectron's COCOEvaluator
evaluation.txt As eval.py takes some time to execute (from 10 to 20 minutes) I've recorded last output of evaluation in this separate text file

Generally all trained models .pth files should be available through here or here, but if not - refer to individual download.txt links

Using the models

In usage_example module I've provided a sample script example.py that builds Detectron objects (config and predictor) for my trained model and runs inference on it, with a sample interpretation of Detectron's outputs. Please note that test.py and eval.py in model folders use common functions shared between models in this project. While example.py provides a completely clean setup, assuming you only want to download the models and use them for inference directly. It requires only Pillow, OpenCV, numpy and Detectron2 to run.

Hardware used

Believe it or not but training of these models was performed on a regular consumer-grade personal gaming PC with one NVIDIA 2070 SUPER (8GB) GPU, Intel Core i5-10600K CPU and 32 GB RAM.

Links

In case you’d like to check my other work or contact me: