Skip to content

Training framework for easy creation, training and exporting of deep learning models.

License

Notifications You must be signed in to change notification settings

luxonis/luxonis-train

Repository files navigation

Luxonis Training Framework

Ubuntu Windows MacOS

License PyBadge Ruff CI codecov

🌟 Overview

LuxonisTrain is a user-friendly tool designed to streamline the training of deep learning models, especially for edge devices. Built on top of PyTorch Lightning, it simplifies the process of training, testing, and exporting models with minimal coding required.

✨ Key Features

  • No Coding Required: Define your training pipeline entirely through a single YAML configuration file.
  • Predefined Configurations: Utilize ready-made configs for common computer vision tasks to start quickly.
  • Customizable: Extend functionality with custom components using an intuitive Python API.
  • Edge Optimized: Focus on models optimized for deployment on edge devices with limited compute resources.

Warning

The project is in a beta state and might be unstable or contain bugs - please report any feedback.

πŸš€ Quick Start

Get started with LuxonisTrain in just a few steps:

  1. Install LuxonisTrain

    pip install luxonis-train

    This will create the luxonis_train executable in your PATH.

  2. Use the provided configs/detection_light_model.yaml configuration file

    You can download the file by executing the following command:

    wget https://raw.githubusercontent.com/luxonis/luxonis-train/main/configs/detection_light_model.yaml
  3. Find a suitable dataset for your task

    We will use a sample COCO dataset from RoboFlow in this example.

  4. Start training

    luxonis_train train                   \
      --config detection_light_model.yaml \
      loader.params.dataset_dir "roboflow://team-roboflow/coco-128/2/coco"
  5. Monitor progress with TensorBoard

    tensorboard --logdir output/tensorboard_logs

    Open the provided URL in your browser to visualize the training progress

πŸ“œ Table Of Contents

πŸ› οΈ Installation

LuxonisTrain requires Python 3.10 or higher. We recommend using a virtual environment to manage dependencies.

Install via pip:

pip install luxonis-train

This will also install the luxonis_train CLI. For more information on how to use it, see CLI Usage.

πŸ“ Usage

You can use LuxonisTrain either from the command line or via the Python API. We will demonstrate both ways in the following sections.

πŸ’» CLI

The CLI is the most straightforward way how to use LuxonisTrain. The CLI provides several commands for training, testing, tuning, exporting and more.

Available commands:

  • train - Start the training process
  • test - Test the model on a specific dataset view
  • infer - Run inference on a dataset, image directory, or a video file.
  • export - Export the model to either ONNX or BLOB format that can be run on edge devices
  • archive - Create an NN Archive file that can be used with our DepthAI API (coming soon)
  • tune - Tune the hyperparameters of the model for better performance
  • inspect - Inspect the dataset you are using and visualize the annotations

To get help on any command:

luxonis_train <command> --help

Specific usage examples can be found in the respective sections below.

βš™οΈ Configuration

LuxonisTrain uses YAML configuration files to define the training pipeline. Here's a breakdown of the key sections:

model:
  name: model_name

  # Use a predefined detection model instead of defining
  # the model architecture manually
  predefined_model:
    name: DetectionModel
    params:
      variant: light

# Download and parse the coco dataset from RoboFlow.
# Save it internally as `coco_test` dataset for future reference.
loader:
  params:
    dataset_name: coco_test
    dataset_dir: "roboflow://team-roboflow/coco-128/2/coco"

trainer:
  batch_size: 8
  epochs: 200
  n_workers: 8
  validation_interval: 10

  preprocessing:
    train_image_size: [384, 384]

    # Uses the imagenet normalization by default
    normalize:
      active: true

    # Augmentations are powered by Albumentations
    augmentations:
      - name: Defocus
      - name: Sharpen
      - name: Flip

  callbacks:
    - name: ExportOnTrainEnd
    - name: ArchiveOnTrainEnd
    - name: TestOnTrainEnd

  optimizer:
    name: SGD
    params:
      lr: 0.02

  scheduler:
    name: ConstantLR

For an extensive list of all the available options, see Configuration.

We provide a set of predefined configuration files for the most common computer vision tasks. You can find them in the configs directory.

πŸ—ƒοΈ Data Preparation

LuxonisTrain supports several ways of loading data:

  • using a data directory in one of the supported formats
  • using an already existing dataset in our custom LuxonisDataset format
  • using a custom loader
    • to learn how to implement and use custom loaders, see Customizations

πŸ“‚ Data Directory

The easiest way to load data is to use a directory with the dataset in one of the supported formats.

Supported formats:

  • COCO - We support COCO JSON format in two variants:
  • Pascal VOC XML
  • YOLO Darknet TXT
  • YOLOv4 PyTorch TXT
  • MT YOLOv6
  • CreateML JSON
  • TensorFlow Object Detection CSV
  • Classification Directory - A directory with subdirectories for each class
    dataset_dir/
    β”œβ”€β”€ train/
    β”‚   β”œβ”€β”€ class1/
    β”‚   β”‚   β”œβ”€β”€ img1.jpg
    β”‚   β”‚   β”œβ”€β”€ img2.jpg
    β”‚   β”‚   └── ...
    β”‚   β”œβ”€β”€ class2/
    β”‚   └── ...
    β”œβ”€β”€ valid/
    └── test/
    
  • Segmentation Mask Directory - A directory with images and corresponding masks.
    dataset_dir/
    β”œβ”€β”€ train/
    β”‚   β”œβ”€β”€ img1.jpg
    β”‚   β”œβ”€β”€ img1_mask.png
    β”‚   β”œβ”€β”€ ...
    β”‚   └── _classes.csv
    β”œβ”€β”€ valid/
    └── test/
    
    The masks are stored as grayscale PNG images where each pixel value corresponds to a class. The mapping from pixel values to classes is defined in the _classes.csv file.
    Pixel Value, Class
    0, background
    1, class1
    2, class2
    3, class3
    

Preparing your Data

  1. Organize your dataset into one of the supported formats.
  2. Place your dataset in a directory accessible by the training script.
  3. Update the dataset_dir parameter in the configuration file to point to the dataset directory.

The dataset_dir can be one of the following:

  • Local path to the dataset directory
  • URL to a remote dataset
    • The dataset will be downloaded to a "data" directory in the current working directory
    • Supported URL protocols:
      • s3://bucket/path/to/directory fo AWS S3
      • gs://buclet/path/to/directory for Google Cloud Storage
      • roboflow://workspace/project/version/format for RoboFlow
        • workspace - name of the workspace the dataset belongs to
        • project - name of the project the dataset belongs to
        • version - version of the dataset
        • format - one of coco, darknet, voc, yolov4pytorch, mt-yolov6, createml, tensorflow, folder, png-mask-semantic
        • example: roboflow://team-roboflow/coco-128/2/coco

Example:

loader:
  params:
    dataset_name: "coco_test"
    dataset_dir: "roboflow://team-roboflow/coco-128/2/coco"

πŸ’Ύ LuxonisDataset

LuxonisDataset is our custom dataset format designed for easy and efficient dataset management. To learn more about how to create a dataset in this format from scratch, see the Luxonis ML repository.

To use the LuxonisDataset as a source of the data, specify the following in the config file:

loader:
  params:
    # name of the dataset
    dataset_name: "dataset_name"

    # one of local (default), s3, gcs
    bucket_storage: "local"

Tip

To inspect the loader output, use the luxonis_train inspect command:

luxonis_train inspect --config configs/detection_light_model.yaml

The inspect command is currently only available in the CLI

πŸ‹οΈβ€β™‚οΈ Training

Once your configuration file and dataset are ready, start the training process.

CLI:

luxonis_train train --config configs/detection_light_model.yaml

Tip

To change a configuration parameter from the command line, use the following syntax:

luxonis_train train                           \
  --config configs/detection_light_model.yaml \
  loader.params.dataset_dir "roboflow://team-roboflow/coco-128/2/coco"

Python API:

from luxonis_train import LuxonisModel

model = LuxonisModel(
  "configs/detection_light_model.yaml",
  {"loader.params.dataset_dir": "roboflow://team-roboflow/coco-128/2/coco"}
)
model.train()

Expected Output:

INFO     Using predefined model: `DetectionModel`
INFO     Main metric: `MeanAveragePrecision`
INFO     GPU available: True (cuda), used: True
INFO     TPU available: False, using: 0 TPU cores
INFO     HPU available: False, using: 0 HPUs
...
INFO     Training finished
INFO     Checkpoints saved in: output/1-coral-wren

Monitoring with TensorBoard:

If not explicitly disabled, the training process will be monitored by TensorBoard. To start the TensorBoard server, run:

tensorboard --logdir output/tensorboard_logs

Open the provided URL to visualize training metrics.

✍ Testing

Evaluate your trained model on a specific dataset view (train, val, or test).

CLI:

luxonis_train test --config configs/detection_light_model.yaml \
                   --view val                                  \
                   --weights path/to/checkpoint.ckpt

Python API:

from luxonis_train import LuxonisModel

model = LuxonisModel("configs/detection_light_model.yaml")
model.test(weights="path/to/checkpoint.ckpt")

The testing process can be started automatically at the end of the training by using the TestOnTrainEnd callback. To learn more about callbacks, see Callbacks.

🧠 Inference

Run inference on images, datasets, or videos.

CLI:

  • Inference on a Dataset View:
luxonis_train infer --config configs/detection_light_model.yaml \
                    --view val                                  \
                    --weights path/to/checkpoint.ckpt
  • Inference on a Video File:
luxonis_train infer --config configs/detection_light_model.yaml \
                    --weights path/to/checkpoint.ckpt           \
                    --source-path path/to/video.mp4
  • Inference on an Image Directory:
luxonis_train infer --config configs/detection_light_model.yaml \
                    --weights path/to/checkpoint.ckpt           \
                    --source-path path/to/images                \
                    --save-dir path/to/save_directory

Python API:

from luxonis_train import LuxonisModel

model = LuxonisModel("configs/detection_light_model.yaml")

# infer on a dataset view
model.infer(weights="path/to/checkpoint.ckpt", view="val")

# infer on a video file
model.infer(weights="path/to/checkpoint.ckpt", source_path="path/to/video.mp4")

# infer on an image directory and save the results
model.infer(
    weights="path/to/checkpoint.ckpt",
    source_path="path/to/images",
    save_dir="path/to/save_directory",
)

πŸ€– Exporting

Export your trained models to formats suitable for deployment on edge devices.

Supported formats:

  • ONNX: Open Neural Network Exchange format.
  • BLOB: Format compatible with OAK-D cameras.

To configure the exporter, you can specify the exporter section in the config file.

You can see an example export configuration here.

CLI:

luxonis_train export --config configs/example_export.yaml --weights path/to/weights.ckpt

Python API:

from luxonis_train import LuxonisModel

model = LuxonisModel("configs/example_export.yaml")
model.export(weights="path/to/weights.ckpt")

Model export can be run automatically at the end of the training by using the ExportOnTrainEnd callback.

The exported models are saved in the export directory within your output folder.

πŸ—‚οΈ NN Archive

Create an NN Archive file for easy deployment with the DepthAI API.

The archive contains the exported model together with all the metadata needed for running the model.

CLI:

luxonis_train archive                         \
  --config configs/detection_light_model.yaml \
  --weights path/to/checkpoint.ckpt

Python API:

from luxonis_train import LuxonisModel

model = LuxonisModel("configs/detection_light_model.yaml")
model.archive(weights="path/to/checkpoint.ckpt")

The archive can be created automatically at the end of the training by using the ArchiveOnTrainEnd callback.

πŸ”¬ Tuning

Optimize your model's performance using hyperparameter tuning powered by Optuna.

Configuration:

Include a tuner section in your configuration file.

tuner:
  study_name: det_study
  n_trials: 10
  storage:
    storage_type: local
  params:
    trainer.optimizer.name_categorical: ["Adam", "SGD"]
    trainer.optimizer.params.lr_float: [0.0001, 0.001]
    trainer.batch_size_int: [4, 16, 4]

CLI:

luxonis_train tune --config configs/example_tuning.yaml

Python API:

from luxonis_train import LuxonisModel

model = LuxonisModel("configs/example_tuning.yaml")
model.tune()

🎨 Customizations

LuxonisTrain is highly modular, allowing you to customize various components:

  • Loaders: Handles data loading and preprocessing.
  • Nodes: Represents computational units in the model architecture.
  • Losses: Define the loss functions used to train the model.
  • Metrics: Measure the model's performance during training.
  • Visualizers: Visualize the model's predictions during training.
  • Callbacks: Allow custom code to be executed at different stages of training.
  • Optimizers: Control how the model's weights are updated.
  • Schedulers: Adjust the learning rate during training.
  • Training Strategy: Specify a custom combination of optimizer and scheduler to tailor the training process for specific use cases.

Creating Custom Components:

Implement custom components by subclassing the respective base classes and/or registering them. Registered components can be referenced in the config file. Custom components need to inherit from their respective base classes:

Examples:

Custom Callback:

import lightning.pytorch as pl

from luxonis_train import LuxonisLightningModule
from luxonis_train.utils.registry import CALLBACKS


@CALLBACKS.register_module()
class CustomCallback(pl.Callback):
    def __init__(self, message: str, **kwargs):
        super().__init__(**kwargs)
        self.message = message

    # Will be called at the end of each training epoch.
    # Consult the PyTorch Lightning documentation for more callback methods.
    def on_train_epoch_end(
        self,
        trainer: pl.Trainer,
        pl_module: LuxonisLightningModule,
    ) -> None:
        print(self.message)

Custom Loss:

from torch import Tensor

from luxonis_train import BaseLoss, TaskType

# Subclasses of `BaseNode`, `BaseLoss`, `BaseMetric`
# and `BaseVisualizer` are registered automatically.
class CustomLoss(BaseLoss):
    supported_tasks = [TaskType.CLASSIFICATION, TaskType.SEGMENTATION]

    def __init__(self, smoothing: float, **kwargs):
        super().__init__(**kwargs)
        self.smoothing = smoothing

    def forward(self, predictions: Tensor, targets: Tensor) -> Tensor:
        # Implement the actual loss logic here
        value = predictions.sum() * self.smoothing
        return value.abs()

Using custom components in the configuration file:

model:
  nodes:
  - name: SegmentationHead
    losses:
    - name: CustomLoss
      params:
        smoothing: 0.0001

trainer:
  callbacks:
    - name: CustomCallback
      params:
        lr: "Hello from the custom callback!"

Note

Files containing the custom components must be sourced before the training script is run. To do that in CLI, you can use the --source argument:

luxonis_train --source custom_components.py train --config config.yaml

Python API:

You have to import the custom components before creating the LuxonisModel instance.

from custom_components import *
from luxonis_train import LuxonisModel

model = LuxonisModel("config.yaml")
model.train()

For more information on how to define custom components, consult the respective in-source documentation.

πŸ“š Tutorials and Examples

We are actively working on providing examples and tutorials for different parts of the library which will help you to start more easily. The tutorials can be found here and will be updated regularly.

πŸ”‘ Credentials

When using cloud services, avoid hard-coding credentials or placing them directly in your configuration files. Instead:

  • Use environment variables to store sensitive information.
  • Use a .env file and load it securely, ensuring it's excluded from version control.

Supported Cloud Services:

  • AWS S3, requires:
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_S3_ENDPOINT_URL
  • Google Cloud Storage, requires:
    • GOOGLE_APPLICATION_CREDENTIALS
  • RoboFlow, requires:
    • ROBOFLOW_API_KEY

For logging and tracking, we support:

  • MLFlow, requires:
    • MLFLOW_S3_BUCKET
    • MLFLOW_S3_ENDPOINT_URL
    • MLFLOW_TRACKING_URI
  • WandB, requires:
    • WANDB_API_KEY

For remote database storage, we support:

  • POSTGRES_PASSWORD
  • POSTGRES_HOST
  • POSTGRES_PORT
  • POSTGRES_DB

🀝 Contributing

We welcome contributions! Please read our Contribution Guide to get started. Whether it's reporting bugs, improving documentation, or adding new features, your help is appreciated.

About

Training framework for easy creation, training and exporting of deep learning models.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages