Skip to content

Skeleton for DVC pipeline to evaluate multiple models together

License

Notifications You must be signed in to change notification settings

shcheklein/ensemble-dvc-template

Repository files navigation

DVC Model Ensemble

Skeleton for DVC pipeline to evaluate multiple models together. An experiment in response to this SO question. There is no actual ML model train code, no frameworks, no actual data. This is made to showcase the possible project layout, how to run this pipeline to see metrics, plots, etc.

Explore CLI, VS Code extension, Studio, and Codespaces tools to experiments, visualize, and share the results.

Stucture

.
├── LICENSE
├── README.md

# Project / pipeline definition (dvc.yaml) and
# project artifacts and dependecies snapshot (dvc.lock)

├── dvc.lock
├── dvc.yaml

# Metrics, plots that are logger via `dvclive` logger
# https://dvc.org/doc/dvclive
# This is similar to TB, W&B, MLFlow, etc loggers

├── dvclive
│   ├── model-1
│   │   ├── metrics.json
│   │   └── plots
│   │       └── metrics
│   │           └── acc.tsv

...  # Any number of models

│   └── model-N

# Evaluation script that can read multiple models or their metrics
# and dumps aggregare metrics

├── evaluate.py
├── evaluation
│   └── metrics.json

# Mode data, train, and model itself.
# If data and model are large they could be tracked and saved to any
# supported remote storage. Here we put them in Git for simplicity.

├── model-1
│   ├── data
│   │   └── data.csv
│   ├── data.dvc
│   ├── model.pkl
│   ├── params.yaml
│   └── train.py

...  # Any number of model

├── model-N
    ...
└── requirements.txt

CLI

Using set of the dvc exp commands it's possible to iterate on models and compare different iterations with each other.

vim model-1/params.yaml. # change params
dvc exp run              # run an experiment 
dvc exp show             # show all experiments

# Queue multiple experiments and run them:
dvc exp run --queue -S model-1/params.yaml:res=0.8,0.82,0.84,0.86
dvc exp run --run-all

# Show experimnents again:
dvc exp show
cli.mp4

Also, it's possible to show plots with:

dvc plots show
# or
dvc plots diff

Screen Shot 2022-10-30 at 3 35 23 PM

VS Code Extension

Can be installed from the marketplace and provides a visual layer for DVC experiments, plots, and common actions for data management.

vs-code.mp4

Studio

Open public project for this repository.

Studio (see docs here) provides a collaborative interface to share experiments, see and manage ML models in model registry.

studio.mp4

Codespaces

Project also can be run in the GitHub Codespaces for the in browser or descktop platform that is deployed with one click:

codespaces.mp4