Skip to content

Commit

Permalink
ClearML experiment tracking integration (ultralytics#8620)
Browse files Browse the repository at this point in the history
* Add titles to matplotlib plots

* Add ClearML Experiment Tracking integration.

* Add ClearML Data Version Management automatic download when requested

* Add ClearML Hyperparameter Optimization

* ClearML save period integration

* Fix wandb breaking when used with ClearML dataset

* Fix wandb breaking when used with ClearML resume and dataset

* Add ClearML documentation

* fixed small bug in clearml integration that misreports epoch number

* Final ClearMl additions before refactor

* Add correct epoch reporting

* Add remote execution and autoscaling docs for ClearML integration

* Added images to clearml integration docs

* fixed logo alignment bug and added hpo screenshot clearml

* Fixed small epoch number bug in clearml integration

* Remove saved model flush clearml

* Cleanup clearml readme section

* Cleaned up clearml logger docstring

* Remove resume readme section clearml

* Clearml integration cleanup

* Updated ClearML documentation

* Added dark vs light icons ClearML Readme

* Clearml Readme styling

* Add better gifs

* Fixed gif file size

* Add better images in tutorial notebook

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressed comments in PR ultralytics#8620

* Fixed circular import

* Fixed circular import

* Update tutorial.ipynb

* Update tutorial.ipynb

* Inline comment

* Restructured tutorial notebook

* Add correct ClearML link to README

* Update tutorial.ipynb

* Update general.py

* Update __init__.py

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* Update README.md

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* spelling

* Update tutorial.ipynb

* notebook cutt.ly links

* Update README.md

* Update README.md

* cutt.ly links in tutorial

* Removed labels as they show up on last subplot only

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
  • Loading branch information
3 people authored and Clay Januhowski committed Sep 8, 2022
1 parent e818d26 commit 72fca1c
Show file tree
Hide file tree
Showing 13 changed files with 575 additions and 21 deletions.
21 changes: 14 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,8 @@ python train.py --data coco.yaml --cfg yolov5n.yaml --weights '' --batch-size 12
- [Train Custom Data](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data)  🚀 RECOMMENDED
- [Tips for Best Training Results](https://github.com/ultralytics/yolov5/wiki/Tips-for-Best-Training-Results)  ☘️
RECOMMENDED
- [Weights & Biases Logging](https://github.com/ultralytics/yolov5/issues/1289)  🌟 NEW
- [ClearML Logging](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml) 🌟 NEW
- [Weights & Biases Logging](https://github.com/ultralytics/yolov5/issues/1289)
- [Roboflow for Datasets, Labeling, and Active Learning](https://github.com/ultralytics/yolov5/issues/4975)  🌟 NEW
- [Multi-GPU Training](https://github.com/ultralytics/yolov5/issues/475)
- [PyTorch Hub](https://github.com/ultralytics/yolov5/issues/36)  ⭐ NEW
Expand Down Expand Up @@ -190,17 +191,23 @@ Get started in seconds with our verified environments. Click each icon below for
## <div align="center">Integrations</div>

<div align="center">
<a href="https://wandb.ai/site?utm_campaign=repo_yolo_readme">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-wb-long.png" width="49%"/>
<a href="https://cutt.ly/yolov5-readme-clearml#gh-light-mode-only">
<img src="https://github.com/thepycoder/clearml_screenshots/raw/main/banner_github.png#gh-light-mode-only" width="32%" />
</a>
<a href="https://cutt.ly/yolov5-readme-clearml#gh-dark-mode-only">
<img src="https://github.com/thepycoder/clearml_screenshots/raw/main/banner_github_light.png#gh-dark-mode-only" width="32%" />
</a>
<a href="https://roboflow.com/?ref=ultralytics">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-roboflow-long.png" width="49%"/>
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-roboflow-long.png" width="33%"/>
</a>
<a href="https://wandb.ai/site?utm_campaign=repo_yolo_readme">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-wb-long.png" width="33%"/>
</a>
</div>

|Weights and Biases|Roboflow ⭐ NEW|
|:-:|:-:|
|Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |
|ClearML ⭐ NEW|Roboflow|Weights and Biases
|:-:|:-:|:-:|
|Automatically track, visualize and even remotely train YOLOv5 using [ClearML](https://cutt.ly/yolov5-readme-clearml) (open-source!)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)

<!-- ## <div align="center">Compete and Win</div>
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ protobuf<=3.20.1 # https://github.com/ultralytics/yolov5/issues/8012
# Logging -------------------------------------
tensorboard>=2.4.1
# wandb
# clearml

# Plotting ------------------------------------
pandas>=1.1.4
Expand Down
2 changes: 2 additions & 0 deletions train.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ def train(hyp, opt, device, callbacks): # hyp is path/to/hyp.yaml or hyp dictio
data_dict = None
if RANK in {-1, 0}:
loggers = Loggers(save_dir, weights, opt, hyp, LOGGER) # loggers instance
if loggers.clearml:
data_dict = loggers.clearml.data_dict # None if no ClearML dataset or filled in by ClearML
if loggers.wandb:
data_dict = loggers.wandb.data_dict
if resume:
Expand Down
27 changes: 26 additions & 1 deletion tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"provenance": [],
"collapsed_sections": [],
"machine_shape": "hm",
"toc_visible": true,
"include_colab_link": true
},
"kernelspec": {
Expand Down Expand Up @@ -913,6 +914,30 @@
"# 4. Visualize"
]
},
{
"cell_type": "markdown",
"source": [
"## ClearML Logging and Automation 🌟 NEW\n",
"\n",
"[ClearML](https://cutt.ly/yolov5-notebook-clearml) is completely integrated into YOLOv5 to track your experimentation, manage dataset versions and even remotely execute training runs.\n",
"\n",
"To enable ClearML (Check cells above):\n",
"- `pip install clearml`\n",
"- run `clearml-init` to connect to a ClearML server (**deploy your own open-source server [here](https://github.com/allegroai/clearml-server)**, or use our free hosted server [here](https://cutt.ly/yolov5-notebook-clearml))\n",
"\n",
"You'll get all the great expected features from an experiment manager: live updates, model upload, experiment comparison etc. but ClearML also tracks uncommitted changes and installed packages for example. Thanks to that ClearML Tasks (which is what we call experiments) are also reproducible on different machines! With only 1 extra line, we can schedule a YOLOv5 training task on a queue to be executed by any number of ClearML Agents (workers).\n",
"\n",
"You can use ClearML Data to version your dataset and then pass it to YOLOv5 simply using its unique ID. This will help you keep track of your data without adding extra hassle. \n",
"\n",
"Explore the [ClearML Tutorial](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml) for more info!\n",
"\n",
"<a href=\"https://cutt.ly/yolov5-notebook-clearml\">\n",
"<img alt=\"ClearML Experiment Management UI\" src=\"https://github.com/thepycoder/clearml_screenshots/raw/main/scalars.jpg\" width=\"1280\"/></a>"
],
"metadata": {
"id": "Lay2WsTjNJzP"
}
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -1105,4 +1130,4 @@
"outputs": []
}
]
}
}
4 changes: 4 additions & 0 deletions utils/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import re
import shutil
import signal
import sys
import threading
import time
import urllib
Expand Down Expand Up @@ -449,6 +450,9 @@ def check_file(file, suffix=''):
torch.hub.download_url_to_file(url, file)
assert Path(file).exists() and Path(file).stat().st_size > 0, f'File download failed: {url}' # check
return file
elif file.startswith('clearml://'): # ClearML Dataset ID
assert 'clearml' in sys.modules, "ClearML is not installed, so cannot use ClearML dataset. Try running 'pip install clearml'."
return file
else: # search
files = []
for d in 'data', 'models', 'utils': # search directories
Expand Down
70 changes: 59 additions & 11 deletions utils/loggers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,12 @@
from torch.utils.tensorboard import SummaryWriter

from utils.general import colorstr, cv2, emojis
from utils.loggers.clearml.clearml_utils import ClearmlLogger
from utils.loggers.wandb.wandb_utils import WandbLogger
from utils.plots import plot_images, plot_results
from utils.torch_utils import de_parallel

LOGGERS = ('csv', 'tb', 'wandb') # text-file, TensorBoard, Weights & Biases
LOGGERS = ('csv', 'tb', 'wandb', 'clearml') # *.csv, TensorBoard, Weights & Biases, ClearML
RANK = int(os.getenv('RANK', -1))

try:
Expand All @@ -32,6 +33,13 @@
except (ImportError, AssertionError):
wandb = None

try:
import clearml

assert hasattr(clearml, '__version__') # verify package import not local dir
except (ImportError, AssertionError):
clearml = None


class Loggers():
# YOLOv5 Loggers class
Expand Down Expand Up @@ -61,10 +69,14 @@ def __init__(self, save_dir=None, weights=None, opt=None, hyp=None, logger=None,
setattr(self, k, None) # init empty logger dictionary
self.csv = True # always log to csv

# Message
# Messages
if not wandb:
prefix = colorstr('Weights & Biases: ')
s = f"{prefix}run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)"
s = f"{prefix}run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs in Weights & Biases"
self.logger.info(emojis(s))
if not clearml:
prefix = colorstr('ClearML: ')
s = f"{prefix}run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 runs in ClearML"
self.logger.info(emojis(s))

# TensorBoard
Expand All @@ -82,12 +94,17 @@ def __init__(self, save_dir=None, weights=None, opt=None, hyp=None, logger=None,
self.wandb = WandbLogger(self.opt, run_id)
# temp warn. because nested artifacts not supported after 0.12.10
if pkg.parse_version(wandb.__version__) >= pkg.parse_version('0.12.11'):
self.logger.warning(
"YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected."
)
s = "YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected."
self.logger.warning(s)
else:
self.wandb = None

# ClearML
if clearml and 'clearml' in self.include:
self.clearml = ClearmlLogger(self.opt, self.hyp)
else:
self.clearml = None

def on_train_start(self):
# Callback runs on train start
pass
Expand All @@ -97,9 +114,12 @@ def on_pretrain_routine_end(self):
paths = self.save_dir.glob('*labels*.jpg') # training labels
if self.wandb:
self.wandb.log({"Labels": [wandb.Image(str(x), caption=x.name) for x in paths]})
if self.clearml:
pass # ClearML saves these images automatically using hooks

def on_train_batch_end(self, ni, model, imgs, targets, paths, plots):
# Callback runs on train batch end
# ni: number integrated batches (since train start)
if plots:
if ni == 0:
if self.tb and not self.opt.sync_bn: # --sync known issue https://github.com/ultralytics/yolov5/issues/3754
Expand All @@ -109,9 +129,12 @@ def on_train_batch_end(self, ni, model, imgs, targets, paths, plots):
if ni < 3:
f = self.save_dir / f'train_batch{ni}.jpg' # filename
plot_images(imgs, targets, paths, f)
if self.wandb and ni == 10:
if (self.wandb or self.clearml) and ni == 10:
files = sorted(self.save_dir.glob('train*.jpg'))
self.wandb.log({'Mosaics': [wandb.Image(str(f), caption=f.name) for f in files if f.exists()]})
if self.wandb:
self.wandb.log({'Mosaics': [wandb.Image(str(f), caption=f.name) for f in files if f.exists()]})
if self.clearml:
self.clearml.log_debug_samples(files, title='Mosaics')

def on_train_epoch_end(self, epoch):
# Callback runs on train epoch end
Expand All @@ -122,12 +145,17 @@ def on_val_image_end(self, pred, predn, path, names, im):
# Callback runs on val image end
if self.wandb:
self.wandb.val_one_image(pred, predn, path, names, im)
if self.clearml:
self.clearml.log_image_with_boxes(path, pred, names, im)

def on_val_end(self):
# Callback runs on val end
if self.wandb:
if self.wandb or self.clearml:
files = sorted(self.save_dir.glob('val*.jpg'))
self.wandb.log({"Validation": [wandb.Image(str(f), caption=f.name) for f in files]})
if self.wandb:
self.wandb.log({"Validation": [wandb.Image(str(f), caption=f.name) for f in files]})
if self.clearml:
self.clearml.log_debug_samples(files, title='Validation')

def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
# Callback runs at the end of each fit (train+val) epoch
Expand All @@ -142,6 +170,10 @@ def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
if self.tb:
for k, v in x.items():
self.tb.add_scalar(k, v, epoch)
elif self.clearml: # log to ClearML if TensorBoard not used
for k, v in x.items():
title, series = k.split('/')
self.clearml.task.get_logger().report_scalar(title, series, v, epoch)

if self.wandb:
if best_fitness == fi:
Expand All @@ -151,12 +183,22 @@ def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
self.wandb.log(x)
self.wandb.end_epoch(best_result=best_fitness == fi)

if self.clearml:
self.clearml.current_epoch_logged_images = set() # reset epoch image limit
self.clearml.current_epoch += 1

def on_model_save(self, last, epoch, final_epoch, best_fitness, fi):
# Callback runs on model save event
if self.wandb:
if ((epoch + 1) % self.opt.save_period == 0 and not final_epoch) and self.opt.save_period != -1:
self.wandb.log_model(last.parent, self.opt, epoch, fi, best_model=best_fitness == fi)

if self.clearml:
if ((epoch + 1) % self.opt.save_period == 0 and not final_epoch) and self.opt.save_period != -1:
self.clearml.task.update_output_model(model_path=str(last),
model_name='Latest Model',
auto_delete_file=False)

def on_train_end(self, last, best, plots, epoch, results):
# Callback runs on training end
if plots:
Expand All @@ -165,7 +207,7 @@ def on_train_end(self, last, best, plots, epoch, results):
files = [(self.save_dir / f) for f in files if (self.save_dir / f).exists()] # filter
self.logger.info(f"Results saved to {colorstr('bold', self.save_dir)}")

if self.tb:
if self.tb and not self.clearml: # These images are already captured by ClearML by now, we don't want doubles
for f in files:
self.tb.add_image(f.stem, cv2.imread(str(f))[..., ::-1], epoch, dataformats='HWC')

Expand All @@ -180,6 +222,12 @@ def on_train_end(self, last, best, plots, epoch, results):
aliases=['latest', 'best', 'stripped'])
self.wandb.finish_run()

if self.clearml:
# Save the best model here
if not self.opt.evolve:
self.clearml.task.update_output_model(model_path=str(best if best.exists() else last),
name='Best Model')

def on_params_update(self, params):
# Update hyperparams or configs of the experiment
# params: A dict containing {param: value} pairs
Expand Down
Loading

0 comments on commit 72fca1c

Please sign in to comment.