Skip to content


Repository files navigation

PyTorch EfficientDet API

A simple training, testing, and inference pipeline using Ross Wightman's EfficientDet models. Ross Wightman's repo is used a submodule to load the EfficientDet models.

The training/testing/inference code are custom written.

Get started with training within 5 minutes if you have the images and XML annotation files.

Get Started with Inference

Open In Colab Kaggle

Go To

Setup for Ubuntu

  1. Clone the repository.

    git clone --recursive
  2. Install requirements.

    1. Method 1: If you have CUDA and cuDNN set up already, do this in your environment of choice

      pip install -r requirments.txt
    2. Method 2: If you want to install PyTorch with CUDA Toolkit in your environment of choice.

      conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch


      conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

      OR install the version with CUDA support as per your choice from here.

      Then install the remaining requirements.

Setup on Windows

  1. First you need to install Microsoft Visual Studio from here. Sing In/Sing Up by clicking on this link and download the Visual Studio Community 2017 edition.

    Install with all the default chosen settings. It should be around 6 GB. Mainly, we need the C++ Build Tools.

  2. Then install the proper pycocotools for Windows.

    pip install git+
  3. Clone the repository.

    git clone --recursive
  4. Install PyTorch with CUDA support.

    conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch


    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

    OR install the version with CUDA support as per your choice from here.

    Then install the remaining requirements except for pycocotools.

Train on Custom Dataset

Taking an exmaple of the smoke dataset from Kaggle. Let's say that the dataset is in the data/smoke_pascal_voc directory in the following format. And the smoke.yaml is in the data_configs directory.

├── data
│   ├── smoke_pascal_voc
│   │   ├── archive
│   │   │   ├── train
│   │   │   └── valid
│   └──
├── data_configs
│   └── smoke.yaml
├── efficientdet-pytorch
│   ├── effdet
│   ...
├── model_configs
│   └── model_config.yaml
├── models
│   ├──
│   ├──
│   └──
├── outputs
│   ├── inference
│   │   ├── res_1
│   │   └── res_2
│   └── training
│       ├── res_1
│       └── res_2
├── torch_utils
│   ├──
│   ├──
│   ├──
│   ├──
│   └──
├── requirements.txt

The content of the smoke.yaml should be the following:

# TRAIN_DIR should be relative to
TRAIN_DIR_IMAGES: data/smoke_pascal_voc/archive/train/images
TRAIN_DIR_LABELS: data/smoke_pascal_voc/archive/train/annotations
# VALID_DIR should be relative to
VALID_DIR_IMAGES: data/smoke_pascal_voc/archive/valid/images
VALID_DIR_LABELS: data/smoke_pascal_voc/archive/valid/annotations
# Class names.
CLASSES: ['smoke']
# Number of classes.
NC: 1
# Whether to save the predictions of the validation set while training.

Note that the data and annotations can be in the same directory as well. In that case, the TRAIN_DIR_IMAGES and TRAIN_DIR_LABELS will save the same path. Similarly for VALID images and labels. The will take care of that.

Next, to start the training, you can use the following command.

Command format:

python --model <name of the model (default tf_efficientdet_lite0)> --config <path to the data config> --device <computation device (default cuda:0 if GPU available system)> --epochs <epochs to train for> --workers <number of parallel workers (default 4)> --batch-size <batch size for data loading (default 8)>  

In this case, the exact command would be:

python --model tf_efficientdet_lite0 --config data_configs/smoke.yaml --device cuda:0 --epochs 5 --workers 4 --batch-size 8  

The terimal output should be similar to the following:

Number of training samples: 665
Number of validation samples: 72

3,191,405 total parameters.
3,191,405 training parameters.
Epoch     0: adjusting learning rate of group 0 to 1.0000e-03.
Epoch: [0]  [ 0/84]  eta: 0:02:17  lr: 0.000013  loss: 1.6518 (1.6518)  time: 1.6422  data: 0.2176  max mem: 1525
Epoch: [0]  [83/84]  eta: 0:00:00  lr: 0.001000  loss: 1.6540 (1.8020)  time: 0.0769  data: 0.0077  max mem: 1548
Epoch: [0] Total time: 0:00:08 (0.0984 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:02  model_time: 0.0928 (0.0928)  evaluator_time: 0.0245 (0.0245)  time: 0.2972  data: 0.1534  max mem: 1548
Test:  [8/9]  eta: 0:00:00  model_time: 0.0318 (0.0933)  evaluator_time: 0.0237 (0.0238)  time: 0.1652  data: 0.0239  max mem: 1548
Test: Total time: 0:00:01 (0.1691 s / it)
Averaged stats: model_time: 0.0318 (0.0933)  evaluator_time: 0.0237 (0.0238)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.009
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.007
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.029
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.074
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.028
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.088
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.167
Epoch: [4]  [ 0/84]  eta: 0:00:20  lr: 0.001000  loss: 0.9575 (0.9575)  time: 0.2461  data: 0.1662  max mem: 1548
Epoch: [4]  [83/84]  eta: 0:00:00  lr: 0.001000  loss: 1.1325 (1.1624)  time: 0.0762  data: 0.0078  max mem: 1548
Epoch: [4] Total time: 0:00:06 (0.0801 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:02  model_time: 0.0369 (0.0369)  evaluator_time: 0.0237 (0.0237)  time: 0.2494  data: 0.1581  max mem: 1548
Test:  [8/9]  eta: 0:00:00  model_time: 0.0323 (0.0330)  evaluator_time: 0.0226 (0.0227)  time: 0.1076  data: 0.0271  max mem: 1548
Test: Total time: 0:00:01 (0.1116 s / it)
Averaged stats: model_time: 0.0323 (0.0330)  evaluator_time: 0.0226 (0.0227)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.137
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.118
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.029
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.175
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.428
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.204
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.306
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.140
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683


Inference on Images using Pretrained Models

Use the efficientdet-pytorch models trained on the COCO dataset.

Command format:

python --input <path/to/input/image> --model <model_name>


python --input data/inference_data/image_1.jpg --model tf_efficientdet_lite0

Inference on Images using Custom Trained Model

Use your custom trained model to run inference on any image. Providing path to config file is mandatory here to get class information

Command format:

python --input <path/to/input/image> --model <model_name> --weights <path/to/saved_model_weights> --config <path/to/config file>


python --input data/inference_data/image_1.jpg --model tf_efficientdet_lite0 --weights outputs/training/res_19/last_model_state.pth --config data_configs/smoke.yaml

Inference on Videos using Pretrained Models

Command format:

python --input <path/to/input/video> --model <model_name>


python --input data/inference_data/video_2.mp4 --model tf_efficientdet_lite0

Inference on Videos using Custom Trained Models

Command format:

python --input <path/to/input/video> --model <model_name> --weights <path/to/saved_model_weights> --config <path/to/config file>


python --input data/inference_data/video_3.mp4 --model tf_efficientdet_lite0 --weights outputs/training/res_19/last_model_state.pth --config data_configs/smoke.yaml