Official repository of COO (ECCV 2022) | Paper | Dataset, Sample | Codes | Leaderboard | Poster | Video
We provide the COmic Onomatopoeia dataset (COO) and the source codes used in our paper.
-
COO has many arbitrary texts, such as extremely curved, partially shrunk texts, or arbitrarily placed texts. Furthermore, some texts are separated into several parts. Each part is a truncated text and is not meaningful by itself. These parts should be linked to represent the intended meaning. Thus, we propose a novel task that predicts the link between truncated texts.
-
COO is a challenging text dataset. Detecting the onomatopoeia region and capturing the intended meaning of truncated texts are very difficult. If a model can recognize comic onomatopoeias, we expect that the model can also recognize other less difficult texts. We hope our work will encourage studies on more irregular texts and further improve text detection, recognition, and link prediction methods.
-
In text detection and recognition tasks, researchers use various datasets to validate their methods. We hope COO can also help validate their methods. In addition, we hope COO is used to analyze Japanese onomatopoeia.
We provide the annotations of the COO.
Several files that help preprocessing, visualization, and data analysis are in COO-data folder.
COO has 61,465 polygons and 2,261 links between truncated texts.
The below figure shows the COO statistics and character types of COO (182 types in total).
According to the license of Manga109, the redistribution of the images of Manga109 is not permitted.
Thus, you should download the images of Manga109 via the Manga109 webpage.
After downloading, unzip Manga109.zip
and then move images
folder of Manga109 into COO-data folder.
We need images
folder in COO-data
folder (i.e. COO-data/images
) for preprocessing.
-
Run the following command.
pip install Flask==2.0.2 Shapely==1.8.0 manga109api==0.3.1 pillow natsort lmdb opencv-python numpy tqdm
-
See the section
dataset
in each model folder.
The source codes used in our paper are provided in each folder.
For text detection, we used ABCNetv2 and MTSv3.
For text recognition, we used TRBA.
For link prediction, we used M4C-COO (a variant of M4C).
We will list the results of SOTA methods that provide the official code.
For the leaderboard, we report the performance of one pretrained model.
Note that we report the average value of three trials in our paper.
We welcome the pull requests containing an official code (URL) of other SOTA methods.
- P: Precision R: Recall H: Hmean *: only use detection part.
- Based on the hmean, methods are sorted in descending order.
Method | P | R | H | Official Code | Pretrained model |
---|---|---|---|---|---|
DBNet++ (TPAMI 2022) | 90.8 | 60.9 | 72.9 | URL | download |
DBNet (AAAI 2020) | 90.9 | 60.3 | 72.5 | URL | download |
PAN (ICCV 2019) | 88.4 | 58.6 | 70.4 | URL | download |
PAN++* (TPAMI 2021) | 78.3 | 62.7 | 69.7 | URL | download |
MTSv3* (ECCV 2020) | 70.1 | 66.0 | 68.0 | URL | download |
PSENet (CVPR 2019) | 83.3 | 57.1 | 67.8 | URL | download |
ABCNetv2* (TPAMI 2021) | 67.2 | 65.1 | 66.1 | URL | download |
- Based on the accuracy, methods are sorted in descending order.
Method | Accuracy | Official Code | Pretrained model |
---|---|---|---|
TRBA+2D (ours) | 81.2 | URL | download |
MASTER (PR 2021) | 74.6 | URL | download |
ABINet w/o pretrain (CVPR 2021) | 70.6 | URL | download |
- P: Precision R: Recall H: Hmean
- Based on the hmean, methods are sorted in descending order.
Method | P | R | H | Official Code | Pretrained model |
---|---|---|---|---|---|
M4C-COO (ours) | 74.5 | 66.3 | 70.2 | URL | download |
M4C-COO with vocab 11640 (ours) | 59.4 | 44.6 | 51.0 | URL | download |
Distance-based rule (ours) | 1.1 | 74.5 | 2.1 | - | - |
When using annotations of comic onomatopoeia dataset (COO) or if you find this work useful for your research, please cite our paper.
@inproceedings{baek2022COO,
title={COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts},
author={Baek, Jeonghun and Matsui, Yusuke and Aizawa, Kiyoharu},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2022}
}
Feel free to contact us if there is any question: Jeonghun Baek ku21fang@gmail.com
For the dataset, annotation data of COO is licensed under a CC BY 4.0.
The license of image data of Manga109 is described here.
For the codes made by us: MIT.
After examining the licenses of original source codes of each method used in our work, we found that the redistribution of source codes is permitted.
Thus, to facilitate future work, we provide the source codes in this repository.
Please let us know if there is a license issue with code redistribution.
If so, we will remove the code and provide the instructions to reproduce our work.