DVC Model Ensemble

Skeleton for DVC pipeline to evaluate multiple models together. An experiment in response to this SO question. There is no actual ML model train code, no frameworks, no actual data. This is made to showcase the possible project layout, how to run this pipeline to see metrics, plots, etc.

Explore CLI, VS Code extension, Studio, and Codespaces tools to experiments, visualize, and share the results.

Stucture

.
├── LICENSE
├── README.md

# Project / pipeline definition (dvc.yaml) and
# project artifacts and dependecies snapshot (dvc.lock)

├── dvc.lock
├── dvc.yaml

# Metrics, plots that are logger via `dvclive` logger
# https://dvc.org/doc/dvclive
# This is similar to TB, W&B, MLFlow, etc loggers

├── dvclive
│   ├── model-1
│   │   ├── metrics.json
│   │   └── plots
│   │       └── metrics
│   │           └── acc.tsv

...  # Any number of models

│   └── model-N

# Evaluation script that can read multiple models or their metrics
# and dumps aggregare metrics

├── evaluate.py
├── evaluation
│   └── metrics.json

# Mode data, train, and model itself.
# If data and model are large they could be tracked and saved to any
# supported remote storage. Here we put them in Git for simplicity.

├── model-1
│   ├── data
│   │   └── data.csv
│   ├── data.dvc
│   ├── model.pkl
│   ├── params.yaml
│   └── train.py

...  # Any number of model

├── model-N
    ...
└── requirements.txt

CLI

Using set of the dvc exp commands it's possible to iterate on models and compare different iterations with each other.

vim model-1/params.yaml. # change params
dvc exp run              # run an experiment 
dvc exp show             # show all experiments

# Queue multiple experiments and run them:
dvc exp run --queue -S model-1/params.yaml:res=0.8,0.82,0.84,0.86
dvc exp run --run-all

# Show experimnents again:
dvc exp show

cli.mp4

Also, it's possible to show plots with:

dvc plots show
# or
dvc plots diff

VS Code Extension

Can be installed from the marketplace and provides a visual layer for DVC experiments, plots, and common actions for data management.

vs-code.mp4

Studio

Open public project for this repository.

Studio (see docs here) provides a collaborative interface to share experiments, see and manage ML models in model registry.

studio.mp4

Codespaces

Project also can be run in the GitHub Codespaces for the in browser or descktop platform that is deployed with one click:

codespaces.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DVC Model Ensemble

Stucture

CLI

VS Code Extension

Studio

Codespaces

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.devcontainer		.devcontainer
.dvc		.dvc
dvclive		dvclive
evaluation		evaluation
model-1		model-1
model-2		model-2
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
artifacts.yaml		artifacts.yaml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
evaluate.py		evaluate.py
requirements.txt		requirements.txt

License

shcheklein/ensemble-dvc-template

Folders and files

Latest commit

History

Repository files navigation

DVC Model Ensemble

Stucture

CLI

VS Code Extension

Studio

Codespaces

About

Topics

Resources

License

Stars

Watchers

Forks

Languages