Skip to content

Latest commit

 

History

History
85 lines (72 loc) · 2.49 KB

DATA.md

File metadata and controls

85 lines (72 loc) · 2.49 KB

Data Prepration

We use two main classes for datasets ingestion, the first is .data/dataset_manager.py, which decodes a dataset_config.yaml file, and registers the datasets towards training and testing. The second is ./data/dataset_mapper.py, which converts the annotations for model propagation and evaluation.

Here's an example for a data configuration:

# ./data_configs/data_config_pretrain.yaml

ROOT: /data  # This is the root directory where the dataset directories are found, each has the dataset images 
             # and an annotations.json file

DATASETS:     # These are datasets we converted to COCO format and used for training
  - SynthText_coco
  - totaltext_train_coco

VAL_DATASETS:  # Validation is performed on these datasets
  - totaltext_test_coco

In each dataset directory, the model expected to find the images and an annotations.json in COCO format according to the following structure:

{
  "info": {},
  "licenses": [{
      "id": 1,
      "name": "<License Name>",
      "url": "<License URL>"
    }],
  "categories": [{
      "id": 1,
      "name": "word", 
      "supercategory": "documents"
  }],
  "type": "instances",
  "images": [ 
    {
      "id": 1,
      "file_name": "img199.jpg",
      "width": 2593,
      "height": 1936,
      "date_captured": "<date captured>",
      "license": 1,
      "coco_url": "",
      "url": ""
    },
    ...
  ],
  "annotations": [
    {"id": 2544, 
      "category_id": 1, 
      "category": "word", 
      "image_id": 300, 
      "word_length": 3, 
      "text": "ROC", 
      "iscrowd": 0, 
      "area": 666.5, 
      "bbox": [47.0, 111.0, 45.0, 25.0], 
      "segmentation": [[53.0, 113.0, 71.0, 118.0, 85.0, 111.0, 92.0, 126.0, 69.0, 136.0, 47.0, 126.0]], 
      "width": 360, 
      "height": 162, 
      "angle": 1.49, 
      "orientation": 0, 
      "rotated_box": [[46.21, 113.42], [91.04, 110.62], [92.53, 134.53], [47.71, 137.33]]}]}
    ...
  ]
}
  

We note that we enrich the baseline COCO annotations with additional fields, including:

    'text': 'Hello'   # A string containing the transcription of the annotation in UTF-8 format
    'text_length': 5  # The length of the text
    'angle' 5.5       # The rotated box angle measured in CCW degrees 
    'orientation' 0 / 1 / 2 / 3  # This is computed by the equation "orientation = (angle + 45) // 90 % 4"
    'rotated_box:     # (2 x 4 float list), measured in absolute image pixels and degrees, marking the coordinates of the box