Distrib (#635)

* [WIP] Added cifar10 distributed example * [WIP] Metric with all reduce decorator and tests * [WIP] Added tests for accumulation metric * [WIP] Updated with reinit_is_reduced * [WIP] Distrib adaptation for other metrics * [WIP] Warnings for EpochMetric and Precision/Recall when distrib * Updated metrics and tests to run on distributed configuration - Test on 2 GPUS single node - Added cmd in .travis.yml to indicate how to test locally - Updated travis to run tests in 4 processes * Minor fixes and cosmetics * Fixed bugs and improved contrib/cifar10 example * Updated docs * Update metrics.rst * Updated docs and set device as "cuda" in distributed instead of raising error * [WIP] Fix missing _is_reduced in precision/recall with tests * Updated other tests * Updated travis and renamed tbptt test gpu -> cuda * Distrib (#573) * [WIP] Added cifar10 distributed example * [WIP] Metric with all reduce decorator and tests * [WIP] Added tests for accumulation metric * [WIP] Updated with reinit_is_reduced * [WIP] Distrib adaptation for other metrics * [WIP] Warnings for EpochMetric and Precision/Recall when distrib * Updated metrics and tests to run on distributed configuration - Test on 2 GPUS single node - Added cmd in .travis.yml to indicate how to test locally - Updated travis to run tests in 4 processes * Minor fixes and cosmetics * Fixed bugs and improved contrib/cifar10 example * Updated docs * Fixes issue #543 (#572) * Fixes issue #543 Previous CM implementation suffered from the problem if target contains non-contiguous indices. New implementation is almost taken from torchvision's https://github.com/pytorch/vision/blob/master/references/segmentation/utils.py#L75-L117 This commit also removes the case of targets as (batchsize, num_categories, ...) where num_categories excludes background class. Confusion matrix computation is possible almost similarly for (batchsize, ...), but when target is all zero (0, ..., 0) = no classes (background class), then confusion matrix does not count any true/false predictions. * Update confusion_matrix.py * Update metrics.rst * Updated docs and set device as "cuda" in distributed instead of raising error * [WIP] Fix missing _is_reduced in precision/recall with tests * Updated other tests * Added mlflow logger (#558) * Added mlflow logger without tests * Added mlflow tests, updated mlflow logger code and other tests * Updated docs and added mlflow in travis * Added tests for mlflow OptimizerParamsHandler - additionally added OptimizerParamsHandler for plx with tests * Update to PyTorch v1.2.0 (#580) * Update .travis.yml * Update .travis.yml * Fixed tests and improved travis * Fix SSL problem of failing travis (#581) * Update .travis.yml * Update .travis.yml * Fixed tests and improved travis * Fixes SSL problem to download model weights * Fixed travis for deploy and nightly * Fixes #583 (#584) * Fixes docs build warnings (#585) * Return removable handle from Engine.add_event_handler(). (#588) * Add tests for event removable handle. Add feature tests for engine.add_event_handler returning removable event handles. * Return RemovableEventHandle from Engine.add_event_handler. * Fixup removable event handle test in python 2.7. Explicitly trigger gc, allowing cycle detection between engine and state, in removable handle weakref test. Python 2.7 cycle detection appears to be less aggressive than python 3+. * Add removable event handler docs. Add autodoc configuration for RemovableEventHandler, expand "concepts" documentation with event remove example following event add example. * Update concepts.rst * Updated travis and renamed tbptt test gpu -> cuda * Compute IoU, Precision, Recall based on CM on CPU * Fixes incomplete merge with 1856c8e * Update distrib branch and CIFAR10 example (#647) * Added tests with gloo, minor updates and fixes * Added single/multi node tests with gloo and [WIP] with nccl * Added tests for multi-node nccl, improved examples/contrib/cifar10 example * Experiments: 1n1gpu, 1n2gpus, 2n2gpus * Fix flake8 * Fixes #645 (#646) - fix CI and improve create_lr_scheduler_with_warmup * Fix tests for python 2.7 * Finalized Cifar10 example (#649) * Added gcp tb logger image and updated README * Added gcp ai platform scripts to run trainings * Improved docs and readmes
pytorch · Oct 24, 2019 · 53190db · 53190db
1 parent e223e9e
commit 53190db
Show file tree

Hide file tree

Showing 52 changed files with 3,392 additions and 347 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -37,13 +37,16 @@ before_install: &before_install
 
 install:
   - python setup.py install
-  - pip install numpy mock pytest codecov pytest-cov
+  - pip install numpy mock pytest codecov pytest-cov pytest-xdist
   # Examples dependencies
   - pip install matplotlib pandas
   - pip install gym==0.10.11
 
 script:
-  - py.test --cov ignite --cov-report term-missing
+  - CUDA_VISIBLE_DEVICES="" py.test --tx 4*popen//python=python$TRAVIS_PYTHON_VERSION --cov ignite --cov-report term-missing -vvv tests/
+  # Run test on cuda device
+  # As no GPUs on travis -> all tests will be skipped
+  - CUDA_VISIBLE_DEVICES=0 py.test --cov ignite --cov-append --cov-report term-missing -vvv tests/ -k "on_cuda"
 
   # Smoke tests for the examples
   # Mnist
@@ -72,6 +75,12 @@ script:
   - mkdir -p /home/travis/.cache/torch/checkpoints/ && wget "https://download.pytorch.org/models/vgg16-397923af.pth" -O/home/travis/.cache/torch/checkpoints/vgg16-397923af.pth
   - python examples/fast_neural_style/neural_style.py train --epochs 1 --cuda 0 --dataset test --dataroot . --image_size 32 --style_image examples/fast_neural_style/images/style_images/mosaic.jpg --style_size 32
 
+  # tests for distributed ops
+  # As no GPUs on travis -> all tests will be skipped
+  # 2 is the number of processes <-> number of available GPUs
+  - export WORLD_SIZE=2
+  - py.test --cov ignite --cov-append --cov-report term-missing --dist=each --tx $WORLD_SIZE*popen//python=python$TRAVIS_PYTHON_VERSION tests -m distributed -vvv
+
 after_success:
   - codecov
 

diff --git a/README.rst b/README.rst
@@ -95,8 +95,42 @@ The code in **ignite.contrib** is not as fully maintained as the core part of th
 
 Examples
 ========
-Please check out the `examples
-<https://github.com/pytorch/ignite/tree/master/examples>`_ to see how to use `ignite` to train various types of networks, as well as how to use `visdom <https://github.com/facebookresearch/visdom>`_ or `tensorboardX <https://github.com/lanpa/tensorboard-pytorch>`_ for training visualizations.
+
+We provide several examples ported from `pytorch/examples <https://github.com/pytorch/examples>`_ using `ignite`
+to display how it helps to write compact and full-featured training loops in a few lines of code:
+
+MNIST example
+--------------
+
+Basic neural network training on MNIST dataset with/without `ignite.contrib` module:
+
+- `MNIST with ignite.contrib TQDM/Tensorboard/Visdom loggers <https://github.com/pytorch/ignite/tree/master/examples/contrib/mnist>`_
+- `MNIST with native TQDM/Tensorboard/Visdom logging <https://github.com/pytorch/ignite/tree/master/examples/mnist>`_
+
+Distributed CIFAR10 example
+---------------------------
+
+Training a small variant of ResNet on CIFAR10 in various configurations: 1) single gpu, 2) single node multiple gpus, 3) multiple nodes and multilple gpus.
+
+- `CIFAR10 <https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10>`_
+
+
+Other examples
+--------------
+
+- `DCGAN <https://github.com/pytorch/ignite/tree/master/examples/gan>`_
+- `Reinforcement Learning <https://github.com/pytorch/ignite/tree/master/examples/reinforcement_learning>`_
+- `Fast Neural Style <https://github.com/pytorch/ignite/tree/master/examples/fast_neural_style>`_
+
+
+Notebooks
+---------
+
+- `Text Classification using Convolutional Neural Networks <https://github.com/pytorch/ignite/blob/master/examples/notebooks/TextCNN.ipynb>`_
+- `Variational Auto Encoders <https://github.com/pytorch/ignite/blob/master/examples/notebooks/VAE.ipynb>`_
+- `Training Cycle-GAN on Horses to Zebras <https://github.com/pytorch/ignite/blob/master/examples/notebooks/CycleGAN.ipynb>`_
+- `Finetuning EfficientNet-B0 on CIFAR100 <https://github.com/pytorch/ignite/blob/master/examples/notebooks/EfficientNet_Cifar100_finetuning.ipynb>`_
+- `Convolutional Neural Networks for Classifying Fashion-MNIST Dataset <https://github.com/pytorch/ignite/blob/master/examples/notebooks/FashionMNIST.ipynb>`_
 
 
 Contributing

diff --git a/docs/source/examples.rst b/docs/source/examples.rst
@@ -1,17 +1,33 @@
 Examples
 ========
 
-Scripts
--------
-
-There are several examples ported from `pytorch/examples <https://github.com/pytorch/examples>`_ using `ignite`
+We provide several examples ported from `pytorch/examples <https://github.com/pytorch/examples>`_ using `ignite`
 to display how it helps to write compact and full-featured training loops in a few lines of code:
 
-- `Mnist <https://github.com/pytorch/ignite/tree/master/examples/mnist>`_
+MNIST example
+-------------
+
+Basic neural network training on MNIST dataset with/without `ignite.contrib` module:
+
+- `MNIST with ignite.contrib TQDM/Tensorboard/Visdom loggers <https://github.com/pytorch/ignite/tree/master/examples/contrib/mnist>`_
+- `MNIST with native TQDM/Tensorboard/Visdom logging <https://github.com/pytorch/ignite/tree/master/examples/mnist>`_
+
+Distributed CIFAR10 example
+---------------------------
+
+Training a small variant of ResNet on CIFAR10 in various configurations: 1) single gpu, 2) single node multiple gpus, 3) multiple nodes and multilple gpus.
+
+- `CIFAR10 <https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10>`_
+
+
+Other examples
+--------------
+
 - `DCGAN <https://github.com/pytorch/ignite/tree/master/examples/gan>`_
 - `Reinforcement Learning <https://github.com/pytorch/ignite/tree/master/examples/reinforcement_learning>`_
 - `Fast Neural Style <https://github.com/pytorch/ignite/tree/master/examples/fast_neural_style>`_
 
+
 Notebooks
 ---------
 

diff --git a/docs/source/metrics.rst b/docs/source/metrics.rst
@@ -7,65 +7,182 @@ fashion without having to store the entire output history of a model.
 In practice a user needs to attach the metric instance to an engine. The metric
 value is then computed using the output of the engine's `process_function`:
 
-    .. code-block:: python
+.. code-block:: python
 
-        def process_function(engine, batch):
-            # ...
-            return y_pred, y
+    def process_function(engine, batch):
+        # ...
+        return y_pred, y
 
-        engine = Engine(process_function)
-        metric = Accuracy()
-        metric.attach(engine, "accuracy")
+    engine = Engine(process_function)
+    metric = Accuracy()
+    metric.attach(engine, "accuracy")
 
 If the engine's output is not in the format `y_pred, y`, the user can
 use the `output_transform` argument to transform it:
 
+.. code-block:: python
+
+    def process_function(engine, batch):
+        # ...
+        return {'y_pred': y_pred, 'y_true': y, ...}
+
+    engine = Engine(process_function)
+
+    def output_transform(output):
+        # `output` variable is returned by above `process_function`
+        y_pred = output['y_pred']
+        y = output['y_true']
+        return y_pred, y  # output format is according to `Accuracy` docs
+
+    metric = Accuracy(output_transform=output_transform)
+    metric.attach(engine, "accuracy")
+
+
+.. Note ::
+
+    Most of implemented metrics are adapted to distributed computations and reduce their internal states across the GPUs
+    before computing metric value. This can be helpful to run the evaluation on multiple nodes/GPU instances with a
+    distributed data sampler. Following code snippet shows in detail how to adapt metrics:
+
     .. code-block:: python
 
-        def process_function(engine, batch):
-            # ...
-            return {'y_pred': y_pred, 'y_true': y, ...}
+        device = "cuda:{}".format(local_rank)
+        model = torch.nn.parallel.DistributedDataParallel(model,
+                                                          device_ids=[local_rank, ],
+                                                          output_device=local_rank)
+        test_sampler = DistributedSampler(test_dataset)
+        test_loader = DataLoader(test_dataset, batch_size=batch_size, sampler=test_sampler,
+                                 num_workers=num_workers, pin_memory=True)
 
-        engine = Engine(process_function)
+        evaluator = create_supervised_evaluator(model, metrics={'accuracy': Accuracy(device=device)}, device=device)
 
-        def output_transform(output):
-            # `output` variable is returned by above `process_function`
-            y_pred = output['y_pred']
-            y = output['y_true']
-            return y_pred, y  # output format is according to `Accuracy` docs
 
-        metric = Accuracy(output_transform=output_transform)
-        metric.attach(engine, "accuracy")
+Metric arithmetics
+------------------
 
 Metrics could be combined together to form new metrics. This could be done through arithmetics, such
 as ``metric1 + metric2``, use PyTorch operators, such as ``(metric1 + metric2).pow(2).mean()``,
 or use a lambda function, such as ``MetricsLambda(lambda a, b: torch.mean(a + b), metric1, metric2)``.
 
 For example:
 
-    .. code-block:: python
+.. code-block:: python
 
-        precision = Precision(average=False)
-        recall = Recall(average=False)
-        F1 = (precision * recall * 2 / (precision + recall)).mean()
+    precision = Precision(average=False)
+    recall = Recall(average=False)
+    F1 = (precision * recall * 2 / (precision + recall)).mean()
 
-    .. note::  This example computes the mean of F1 across classes. To combine
-        precision and recall to get F1 or other F metrics, we have to be careful
-        that `average=False`, i.e. to use the unaveraged precision and recall,
-        otherwise we will not be computing F-beta metrics.
+.. note::  This example computes the mean of F1 across classes. To combine
+    precision and recall to get F1 or other F metrics, we have to be careful
+    that `average=False`, i.e. to use the unaveraged precision and recall,
+    otherwise we will not be computing F-beta metrics.
 
 Metrics also support indexing operation (if metric's result is a vector/matrix/tensor). For example, this can be useful to compute mean metric (e.g. precision, recall or IoU) ignoring the background:
 
-    .. code-block:: python
+.. code-block:: python
+
+    cm = ConfusionMatrix(num_classes=10)
+    iou_metric = IoU(cm)
+    iou_no_bg_metric = iou_metric[:9]  # We assume that the background index is 9
+    mean_iou_no_bg_metric = iou_no_bg_metric.mean()
+    # mean_iou_no_bg_metric.compute() -> tensor(0.12345)
+
+How to create a custom metric
+-----------------------------
+
+To create a custom metric one needs to create a new class inheriting from :class:`~ignite.metrics.Metric` and override
+three methods :
+
+- `reset()` : resets internal variables and accumulators
+- `update(output)` : updates internal variables and accumulators with provided batch output `(y_pred, y)`
+- `compute()` : computes custom metric and return the result
+
+For example, we would like to implement for illustration purposes a multi-class accuracy metric with some
+specific condition (e.g. ignore user-defined classes):
+
+.. code-block:: python
+
+    from ignite.metrics import Metric
+    from ignite.exceptions import NotComputableError
+
+    # These decorators helps with distributed settings
+    from ignite.metrics.metric import sync_all_reduce, reinit__is_reduced
+
+
+    class CustomAccuracy(Metric):
+
+        def __init__(self, ignored_class, output_transform=lambda x: x, device=None):
+            self.ignored_class = ignored_class
+            self._num_correct = None
+            self._num_examples = None
+            super(CustomAccuracy, self).__init__(output_transform=output_transform, device=device)
+
+        @reinit__is_reduced
+        def reset(self):
+            self._num_correct = 0
+            self._num_examples = 0
+            super(CustomAccuracy, self).reset()
+
+        @reinit__is_reduced
+        def update(self, output):
+            y_pred, y = output
+
+            indices = torch.argmax(y_pred, dim=1)
+
+            mask = (y != self.ignored_class)
+            mask &= (indices != self.ignored_class)
+            y = y[mask]
+            indices = indices[mask]
+            correct = torch.eq(indices, y).view(-1)
+
+            self._num_correct += torch.sum(correct).item()
+            self._num_examples += correct.shape[0]
+
+        @sync_all_reduce("_num_examples", "_num_correct")
+        def compute(self):
+            if self._num_examples == 0:
+                raise NotComputableError('CustomAccuracy must have at least one example before it can be computed.')
+            return self._num_correct / self._num_examples
+
+
+We imported necessary classes as :class:`~ignite.metrics.Metric`, :class:`~ignite.exceptions.NotComputableError` and
+decorators to adapt the metric for distributed setting. In `reset` method, we reset internal variables `_num_correct`
+and `_num_examples` which are used to compute the custom metric. In `updated` method we define how to update
+the internal variables. And finally in `compute` method, we compute metric value.
+
+We can check this implementation in a simple case:
+
+.. code-block:: python
+
+    import torch
+    torch.manual_seed(8)
+
+    m = CustomAccuracy(ignored_class=3)
+
+    batch_size = 4
+    num_classes = 5
+
+    y_pred = torch.rand(batch_size, num_classes)
+    y = torch.randint(0, num_classes, size=(batch_size, ))
+
+    m.update((y_pred, y))
+    res = m.compute()
+
+    print(y, torch.argmax(y_pred, dim=1))
+    # Out: tensor([2, 2, 2, 3]) tensor([2, 1, 0, 0])
+
+    print(m._num_correct, m._num_examples, res)
+    # Out: 1 3 0.3333333333333333
+
+
+Metrics and distributed computations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-        cm = ConfusionMatrix(num_classes=10)
-        iou_metric = IoU(cm)
-        iou_no_bg_metric = iou_metric[:9]  # We assume that the background index is 9
-        mean_iou_no_bg_metric = iou_no_bg_metric.mean()
-        # mean_iou_no_bg_metric.compute() -> tensor(0.12345)
+In the above example, `CustomAccuracy` constructor has `device` argument and `reset`, `update`, `compute` methods are decorated with `reinit__is_reduced`, `sync_all_reduce`. The purpose of these features is to adapt metrics in distributed computations on CUDA devices and assuming the backend to support `"all_reduce" operation <https://pytorch.org/docs/stable/distributed.html#torch.distributed.all_reduce>`_. User can specify the device (by default, `cuda`) at metric's initialization. This device _can_ be used to store internal variables on and to collect all results from all participating devices. More precisely, in the above example we added `@sync_all_reduce("_num_examples", "_num_correct")` over `compute` method. This means that when `compute` method is called, metric's interal variables `self._num_examples` and `self._num_correct` are summed up over all participating devices. Therefore, once collected, these internal variables can be used to compute the final metric value.
 
 
-Complete list of metrics:
+Complete list of metrics
+------------------------
 
     - :class:`~ignite.metrics.Accuracy`
     - :class:`~ignite.metrics.Average`

diff --git a/examples/contrib/cifar10/.gitignore b/examples/contrib/cifar10/.gitignore
@@ -0,0 +1,6 @@
+output
+cifar10
+.polyaxonignore
+.polyaxon
+plx_configs
+gcp_configs