ClearML experiment tracking integration (ultralytics#8620)

* Add titles to matplotlib plots * Add ClearML Experiment Tracking integration. * Add ClearML Data Version Management automatic download when requested * Add ClearML Hyperparameter Optimization * ClearML save period integration * Fix wandb breaking when used with ClearML dataset * Fix wandb breaking when used with ClearML resume and dataset * Add ClearML documentation * fixed small bug in clearml integration that misreports epoch number * Final ClearMl additions before refactor * Add correct epoch reporting * Add remote execution and autoscaling docs for ClearML integration * Added images to clearml integration docs * fixed logo alignment bug and added hpo screenshot clearml * Fixed small epoch number bug in clearml integration * Remove saved model flush clearml * Cleanup clearml readme section * Cleaned up clearml logger docstring * Remove resume readme section clearml * Clearml integration cleanup * Updated ClearML documentation * Added dark vs light icons ClearML Readme * Clearml Readme styling * Add better gifs * Fixed gif file size * Add better images in tutorial notebook * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Addressed comments in PR ultralytics#8620 * Fixed circular import * Fixed circular import * Update tutorial.ipynb * Update tutorial.ipynb * Inline comment * Restructured tutorial notebook * Add correct ClearML link to README * Update tutorial.ipynb * Update general.py * Update __init__.py * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update README.md * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * spelling * Update tutorial.ipynb * notebook cutt.ly links * Update README.md * Update README.md * cutt.ly links in tutorial * Removed labels as they show up on last subplot only Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
ctjanuhowski · Sep 8, 2022 · 72fca1c · 72fca1c
1 parent e818d26
commit 72fca1c
Show file tree

Hide file tree

Showing 13 changed files with 575 additions and 21 deletions.
diff --git a/README.md b/README.md
@@ -151,7 +151,8 @@ python train.py --data coco.yaml --cfg yolov5n.yaml --weights '' --batch-size 12
 - [Train Custom Data](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data)  🚀 RECOMMENDED
 - [Tips for Best Training Results](https://github.com/ultralytics/yolov5/wiki/Tips-for-Best-Training-Results)  ☘️
   RECOMMENDED
-- [Weights & Biases Logging](https://github.com/ultralytics/yolov5/issues/1289)  🌟 NEW
+- [ClearML Logging](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml) 🌟 NEW
+- [Weights & Biases Logging](https://github.com/ultralytics/yolov5/issues/1289)
 - [Roboflow for Datasets, Labeling, and Active Learning](https://github.com/ultralytics/yolov5/issues/4975)  🌟 NEW
 - [Multi-GPU Training](https://github.com/ultralytics/yolov5/issues/475)
 - [PyTorch Hub](https://github.com/ultralytics/yolov5/issues/36)  ⭐ NEW
@@ -190,17 +191,23 @@ Get started in seconds with our verified environments. Click each icon below for
 ## <div align="center">Integrations</div>
 
 <div align="center">
-    <a href="https://wandb.ai/site?utm_campaign=repo_yolo_readme">
-        <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-wb-long.png" width="49%"/>
+    <a href="https://cutt.ly/yolov5-readme-clearml#gh-light-mode-only">
+        <img src="https://github.com/thepycoder/clearml_screenshots/raw/main/banner_github.png#gh-light-mode-only" width="32%" />
+    </a>
+    <a href="https://cutt.ly/yolov5-readme-clearml#gh-dark-mode-only">
+        <img src="https://github.com/thepycoder/clearml_screenshots/raw/main/banner_github_light.png#gh-dark-mode-only" width="32%" />
     </a>
     <a href="https://roboflow.com/?ref=ultralytics">
-        <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-roboflow-long.png" width="49%"/>
+        <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-roboflow-long.png" width="33%"/>
+    </a>
+    <a href="https://wandb.ai/site?utm_campaign=repo_yolo_readme">
+        <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-wb-long.png" width="33%"/>
     </a>
 </div>
 
-|Weights and Biases|Roboflow ⭐ NEW|
-|:-:|:-:|
-|Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |
+|ClearML ⭐ NEW|Roboflow|Weights and Biases
+|:-:|:-:|:-:|
+|Automatically track, visualize and even remotely train YOLOv5 using [ClearML](https://cutt.ly/yolov5-readme-clearml) (open-source!)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)
 
 <!-- ## <div align="center">Compete and Win</div>
 

diff --git a/requirements.txt b/requirements.txt
@@ -17,6 +17,7 @@ protobuf<=3.20.1  # https://github.com/ultralytics/yolov5/issues/8012
 # Logging -------------------------------------
 tensorboard>=2.4.1
 # wandb
+# clearml
 
 # Plotting ------------------------------------
 pandas>=1.1.4

diff --git a/train.py b/train.py
@@ -90,6 +90,8 @@ def train(hyp, opt, device, callbacks):  # hyp is path/to/hyp.yaml or hyp dictio
     data_dict = None
     if RANK in {-1, 0}:
         loggers = Loggers(save_dir, weights, opt, hyp, LOGGER)  # loggers instance
+        if loggers.clearml:
+            data_dict = loggers.clearml.data_dict  # None if no ClearML dataset or filled in by ClearML
         if loggers.wandb:
             data_dict = loggers.wandb.data_dict
             if resume:

diff --git a/tutorial.ipynb b/tutorial.ipynb
@@ -7,6 +7,7 @@
       "provenance": [],
       "collapsed_sections": [],
       "machine_shape": "hm",
+      "toc_visible": true,
       "include_colab_link": true
     },
     "kernelspec": {
@@ -913,6 +914,30 @@
         "# 4. Visualize"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## ClearML Logging and Automation 🌟 NEW\n",
+        "\n",
+        "[ClearML](https://cutt.ly/yolov5-notebook-clearml) is completely integrated into YOLOv5 to track your experimentation, manage dataset versions and even remotely execute training runs.\n",
+        "\n",
+        "To enable ClearML (Check cells above):\n",
+        "- `pip install clearml`\n",
+        "- run `clearml-init` to connect to a ClearML server (**deploy your own open-source server [here](https://github.com/allegroai/clearml-server)**, or use our free hosted server [here](https://cutt.ly/yolov5-notebook-clearml))\n",
+        "\n",
+        "You'll get all the great expected features from an experiment manager: live updates, model upload, experiment comparison etc. but ClearML also tracks uncommitted changes and installed packages for example. Thanks to that ClearML Tasks (which is what we call experiments) are also reproducible on different machines! With only 1 extra line, we can schedule a YOLOv5 training task on a queue to be executed by any number of ClearML Agents (workers).\n",
+        "\n",
+        "You can use ClearML Data to version your dataset and then pass it to YOLOv5 simply using its unique ID. This will help you keep track of your data without adding extra hassle. \n",
+        "\n",
+        "Explore the [ClearML Tutorial](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml) for more info!\n",
+        "\n",
+        "<a href=\"https://cutt.ly/yolov5-notebook-clearml\">\n",
+        "<img alt=\"ClearML Experiment Management UI\" src=\"https://github.com/thepycoder/clearml_screenshots/raw/main/scalars.jpg\" width=\"1280\"/></a>"
+      ],
+      "metadata": {
+        "id": "Lay2WsTjNJzP"
+      }
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -1105,4 +1130,4 @@
       "outputs": []
     }
   ]
-}
+}
diff --git a/utils/general.py b/utils/general.py
@@ -14,6 +14,7 @@
 import re
 import shutil
 import signal
+import sys
 import threading
 import time
 import urllib
@@ -449,6 +450,9 @@ def check_file(file, suffix=''):
             torch.hub.download_url_to_file(url, file)
             assert Path(file).exists() and Path(file).stat().st_size > 0, f'File download failed: {url}'  # check
         return file
+    elif file.startswith('clearml://'):  # ClearML Dataset ID
+        assert 'clearml' in sys.modules, "ClearML is not installed, so cannot use ClearML dataset. Try running 'pip install clearml'."
+        return file
     else:  # search
         files = []
         for d in 'data', 'models', 'utils':  # search directories

diff --git a/utils/loggers/__init__.py b/utils/loggers/__init__.py
@@ -11,11 +11,12 @@
 from torch.utils.tensorboard import SummaryWriter
 
 from utils.general import colorstr, cv2, emojis
+from utils.loggers.clearml.clearml_utils import ClearmlLogger
 from utils.loggers.wandb.wandb_utils import WandbLogger
 from utils.plots import plot_images, plot_results
 from utils.torch_utils import de_parallel
 
-LOGGERS = ('csv', 'tb', 'wandb')  # text-file, TensorBoard, Weights & Biases
+LOGGERS = ('csv', 'tb', 'wandb', 'clearml')  # *.csv, TensorBoard, Weights & Biases, ClearML
 RANK = int(os.getenv('RANK', -1))
 
 try:
@@ -32,6 +33,13 @@
 except (ImportError, AssertionError):
     wandb = None
 
+try:
+    import clearml
+
+    assert hasattr(clearml, '__version__')  # verify package import not local dir
+except (ImportError, AssertionError):
+    clearml = None
+
 
 class Loggers():
     # YOLOv5 Loggers class
@@ -61,10 +69,14 @@ def __init__(self, save_dir=None, weights=None, opt=None, hyp=None, logger=None,
             setattr(self, k, None)  # init empty logger dictionary
         self.csv = True  # always log to csv
 
-        # Message
+        # Messages
         if not wandb:
             prefix = colorstr('Weights & Biases: ')
-            s = f"{prefix}run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)"
+            s = f"{prefix}run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs in Weights & Biases"
+            self.logger.info(emojis(s))
+        if not clearml:
+            prefix = colorstr('ClearML: ')
+            s = f"{prefix}run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 runs in ClearML"
             self.logger.info(emojis(s))
 
         # TensorBoard
@@ -82,12 +94,17 @@ def __init__(self, save_dir=None, weights=None, opt=None, hyp=None, logger=None,
             self.wandb = WandbLogger(self.opt, run_id)
             # temp warn. because nested artifacts not supported after 0.12.10
             if pkg.parse_version(wandb.__version__) >= pkg.parse_version('0.12.11'):
-                self.logger.warning(
-                    "YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected."
-                )
+                s = "YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected."
+                self.logger.warning(s)
         else:
             self.wandb = None
 
+        # ClearML
+        if clearml and 'clearml' in self.include:
+            self.clearml = ClearmlLogger(self.opt, self.hyp)
+        else:
+            self.clearml = None
+
     def on_train_start(self):
         # Callback runs on train start
         pass
@@ -97,9 +114,12 @@ def on_pretrain_routine_end(self):
         paths = self.save_dir.glob('*labels*.jpg')  # training labels
         if self.wandb:
             self.wandb.log({"Labels": [wandb.Image(str(x), caption=x.name) for x in paths]})
+        if self.clearml:
+            pass  # ClearML saves these images automatically using hooks
 
     def on_train_batch_end(self, ni, model, imgs, targets, paths, plots):
         # Callback runs on train batch end
+        # ni: number integrated batches (since train start)
         if plots:
             if ni == 0:
                 if self.tb and not self.opt.sync_bn:  # --sync known issue https://github.com/ultralytics/yolov5/issues/3754
@@ -109,9 +129,12 @@ def on_train_batch_end(self, ni, model, imgs, targets, paths, plots):
             if ni < 3:
                 f = self.save_dir / f'train_batch{ni}.jpg'  # filename
                 plot_images(imgs, targets, paths, f)
-            if self.wandb and ni == 10:
+            if (self.wandb or self.clearml) and ni == 10:
                 files = sorted(self.save_dir.glob('train*.jpg'))
-                self.wandb.log({'Mosaics': [wandb.Image(str(f), caption=f.name) for f in files if f.exists()]})
+                if self.wandb:
+                    self.wandb.log({'Mosaics': [wandb.Image(str(f), caption=f.name) for f in files if f.exists()]})
+                if self.clearml:
+                    self.clearml.log_debug_samples(files, title='Mosaics')
 
     def on_train_epoch_end(self, epoch):
         # Callback runs on train epoch end
@@ -122,12 +145,17 @@ def on_val_image_end(self, pred, predn, path, names, im):
         # Callback runs on val image end
         if self.wandb:
             self.wandb.val_one_image(pred, predn, path, names, im)
+        if self.clearml:
+            self.clearml.log_image_with_boxes(path, pred, names, im)
 
     def on_val_end(self):
         # Callback runs on val end
-        if self.wandb:
+        if self.wandb or self.clearml:
             files = sorted(self.save_dir.glob('val*.jpg'))
-            self.wandb.log({"Validation": [wandb.Image(str(f), caption=f.name) for f in files]})
+            if self.wandb:
+                self.wandb.log({"Validation": [wandb.Image(str(f), caption=f.name) for f in files]})
+            if self.clearml:
+                self.clearml.log_debug_samples(files, title='Validation')
 
     def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
         # Callback runs at the end of each fit (train+val) epoch
@@ -142,6 +170,10 @@ def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
         if self.tb:
             for k, v in x.items():
                 self.tb.add_scalar(k, v, epoch)
+        elif self.clearml:  # log to ClearML if TensorBoard not used
+            for k, v in x.items():
+                title, series = k.split('/')
+                self.clearml.task.get_logger().report_scalar(title, series, v, epoch)
 
         if self.wandb:
             if best_fitness == fi:
@@ -151,12 +183,22 @@ def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
             self.wandb.log(x)
             self.wandb.end_epoch(best_result=best_fitness == fi)
 
+        if self.clearml:
+            self.clearml.current_epoch_logged_images = set()  # reset epoch image limit
+            self.clearml.current_epoch += 1
+
     def on_model_save(self, last, epoch, final_epoch, best_fitness, fi):
         # Callback runs on model save event
         if self.wandb:
             if ((epoch + 1) % self.opt.save_period == 0 and not final_epoch) and self.opt.save_period != -1:
                 self.wandb.log_model(last.parent, self.opt, epoch, fi, best_model=best_fitness == fi)
 
+        if self.clearml:
+            if ((epoch + 1) % self.opt.save_period == 0 and not final_epoch) and self.opt.save_period != -1:
+                self.clearml.task.update_output_model(model_path=str(last),
+                                                      model_name='Latest Model',
+                                                      auto_delete_file=False)
+
     def on_train_end(self, last, best, plots, epoch, results):
         # Callback runs on training end
         if plots:
@@ -165,7 +207,7 @@ def on_train_end(self, last, best, plots, epoch, results):
         files = [(self.save_dir / f) for f in files if (self.save_dir / f).exists()]  # filter
         self.logger.info(f"Results saved to {colorstr('bold', self.save_dir)}")
 
-        if self.tb:
+        if self.tb and not self.clearml:  # These images are already captured by ClearML by now, we don't want doubles
             for f in files:
                 self.tb.add_image(f.stem, cv2.imread(str(f))[..., ::-1], epoch, dataformats='HWC')
 
@@ -180,6 +222,12 @@ def on_train_end(self, last, best, plots, epoch, results):
                                    aliases=['latest', 'best', 'stripped'])
             self.wandb.finish_run()
 
+        if self.clearml:
+            # Save the best model here
+            if not self.opt.evolve:
+                self.clearml.task.update_output_model(model_path=str(best if best.exists() else last),
+                                                      name='Best Model')
+
     def on_params_update(self, params):
         # Update hyperparams or configs of the experiment
         # params: A dict containing {param: value} pairs