Data Prepration

We use two main classes for datasets ingestion, the first is .data/dataset_manager.py, which decodes a dataset_config.yaml file, and registers the datasets towards training and testing. The second is ./data/dataset_mapper.py, which converts the annotations for model propagation and evaluation.

Here's an example for a data configuration:

# ./data_configs/data_config_pretrain.yaml

ROOT: /data  # This is the root directory where the dataset directories are found, each has the dataset images 
             # and an annotations.json file

DATASETS:     # These are datasets we converted to COCO format and used for training
  - SynthText_coco
  - totaltext_train_coco

VAL_DATASETS:  # Validation is performed on these datasets
  - totaltext_test_coco

In each dataset directory, the model expected to find the images and an annotations.json in COCO format according to the following structure:

{
  "info": {},
  "licenses": [{
      "id": 1,
      "name": "<License Name>",
      "url": "<License URL>"
    }],
  "categories": [{
      "id": 1,
      "name": "word", 
      "supercategory": "documents"
  }],
  "type": "instances",
  "images": [ 
    {
      "id": 1,
      "file_name": "img199.jpg",
      "width": 2593,
      "height": 1936,
      "date_captured": "<date captured>",
      "license": 1,
      "coco_url": "",
      "url": ""
    },
    ...
  ],
  "annotations": [
    {"id": 2544, 
      "category_id": 1, 
      "category": "word", 
      "image_id": 300, 
      "word_length": 3, 
      "text": "ROC", 
      "iscrowd": 0, 
      "area": 666.5, 
      "bbox": [47.0, 111.0, 45.0, 25.0], 
      "segmentation": [[53.0, 113.0, 71.0, 118.0, 85.0, 111.0, 92.0, 126.0, 69.0, 136.0, 47.0, 126.0]], 
      "width": 360, 
      "height": 162, 
      "angle": 1.49, 
      "orientation": 0, 
      "rotated_box": [[46.21, 113.42], [91.04, 110.62], [92.53, 134.53], [47.71, 137.33]]}]}
    ...
  ]
}

We note that we enrich the baseline COCO annotations with additional fields, including:

    'text': 'Hello'   # A string containing the transcription of the annotation in UTF-8 format
    'text_length': 5  # The length of the text
    'angle' 5.5       # The rotated box angle measured in CCW degrees 
    'orientation' 0 / 1 / 2 / 3  # This is computed by the equation "orientation = (angle + 45) // 90 % 4"
    'rotated_box:     # (2 x 4 float list), measured in absolute image pixels and degrees, marking the coordinates of the box

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA.md

DATA.md

Data Prepration

Files

DATA.md

Latest commit

History

DATA.md

File metadata and controls

Data Prepration