Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge master #287

Merged
merged 12 commits into from
Feb 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,11 @@ The tool manages automated machine learning (AutoML) experiments, **dispatches a
* Researchers and data scientists who want to easily **implement and experiment new AutoML algorithms**, may it be: hyperparameter tuning algorithm, neural architect search algorithm or model compression algorithm.
* ML Platform owners who want to **support AutoML in their platform**.

### **[NNI v2.0 has been released!](https://github.com/microsoft/nni/releases) &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**
## **What's NEW!** &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>
* **New release**: [v2.0 is available](https://github.com/microsoft/nni/releases) - _released on Jan-14-2021_
* **New demo available**: [Youtube entry](https://www.youtube.com/channel/UCKcafm6861B2mnYhPbZHavw) | [Bilibili 入口](https://space.bilibili.com/1649051673) - _last updated on Feb-19-2021_

* **New use case sharing**: [Cost-effective Hyper-parameter Tuning using AdaptDL with NNI](https://medium.com/casl-project/cost-effective-hyper-parameter-tuning-using-adaptdl-with-nni-e55642888761) - _posted on Feb-23-2021_

## **NNI capabilities in a glance**

Expand Down
41 changes: 41 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
<!-- BEGIN MICROSOFT SECURITY.MD V0.0.5 BLOCK -->

## Security

Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).

If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)), please report it to us as described below.

## Reporting Security Issues

**Please do not report security vulnerabilities through public GitHub issues.**

Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).

If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).

You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).

Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:

* Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
* Full paths of source file(s) related to the manifestation of the issue
* The location of the affected source code (tag/branch/commit or direct URL)
* Any special configuration required to reproduce the issue
* Step-by-step instructions to reproduce the issue
* Proof-of-concept or exploit code (if possible)
* Impact of the issue, including how an attacker might exploit the issue

This information will help us triage your report more quickly.

If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.

## Preferred Languages

We prefer all communications to be in English.

## Policy

Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).

<!-- END MICROSOFT SECURITY.MD BLOCK -->
27 changes: 8 additions & 19 deletions dependencies/recommended.txt
Original file line number Diff line number Diff line change
@@ -1,23 +1,12 @@
# Recommended because some non-commonly-used modules/examples depend on those packages.

-f https://download.pytorch.org/whl/torch_stable.html
tensorflow
torch >= 1.6+cpu, != 1.7+cpu -f https://download.pytorch.org/whl/torch_stable.html
torchvision >= 0.8+cpu -f https://download.pytorch.org/whl/torch_stable.html
torch == 1.6.0+cpu ; sys_platform != "darwin"
torch == 1.6.0 ; sys_platform == "darwin"
torchvision == 0.7.0+cpu ; sys_platform != "darwin"
torchvision == 0.7.0 ; sys_platform == "darwin"
pytorch-lightning >= 1.1.1, < 1.2
onnx
peewee
thop
graphviz
tensorflow

# the following content will be read by setup.py.
# please follow the logic in setup.py.

# SMAC
ConfigSpaceNNI
smac4nni

# BOHB
ConfigSpace==0.4.7
statsmodels==0.12.0

# PPOTuner
enum34
gym
13 changes: 13 additions & 0 deletions dependencies/recommended_legacy.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-f https://download.pytorch.org/whl/torch_stable.html
tensorflow == 1.15.4
torch == 1.5.1+cpu
torchvision == 0.6.1+cpu

# It will install pytorch-lightning 0.8.x and unit tests won't work.
# Latest version has conflict with tensorboard and tensorflow 1.x.
pytorch-lightning

keras == 2.1.6
onnx
peewee
graphviz
4 changes: 2 additions & 2 deletions dependencies/required.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
astor
hyperopt==0.1.2
hyperopt == 0.1.2
json_tricks
netifaces
psutil
Expand All @@ -9,7 +9,7 @@ responses
schema
PythonWebHDFS
colorama
scikit-learn>=0.23.2
scikit-learn >= 0.23.2
websockets
filelock
prettytable
Expand Down
13 changes: 13 additions & 0 deletions dependencies/required_extra.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# the following content will be read by setup.py.
# please follow the logic in setup.py.

# SMAC
ConfigSpaceNNI
smac4nni

# BOHB
ConfigSpace==0.4.7
statsmodels==0.12.0

# PPOTuner
gym
14 changes: 10 additions & 4 deletions docs/en_US/NAS/retiarii/ApiReference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,16 @@ Graph Mutation APIs
Trainers
--------

.. autoclass:: nni.retiarii.trainer.pytorch.PyTorchImageClassificationTrainer
.. autoclass:: nni.retiarii.trainer.FunctionalTrainer
:members:

.. autoclass:: nni.retiarii.trainer.pytorch.PyTorchMultiModelTrainer
.. autoclass:: nni.retiarii.trainer.pytorch.lightning.LightningModule
:members:

.. autoclass:: nni.retiarii.trainer.pytorch.lightning.Classification
:members:

.. autoclass:: nni.retiarii.trainer.pytorch.lightning.Regression
:members:

Oneshot Trainers
Expand Down Expand Up @@ -75,8 +81,8 @@ Strategies
Retiarii Experiments
--------------------

.. autoclass:: nni.retiarii.experiment.RetiariiExperiment
.. autoclass:: nni.retiarii.experiment.pytorch.RetiariiExperiment
:members:

.. autoclass:: nni.retiarii.experiment.RetiariiExeConfig
.. autoclass:: nni.retiarii.experiment.pytorch.RetiariiExeConfig
:members:
27 changes: 16 additions & 11 deletions docs/en_US/NAS/retiarii/Tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ Create a Trainer and Exploration Strategy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Classic search approach:**
In this approach, trainer is for training each explored model, while strategy is for sampling the models. Both trainer and strategy are required to explore the model space.
In this approach, trainer is for training each explored model, while strategy is for sampling the models. Both trainer and strategy are required to explore the model space. We recommend PyTorch-Lightning to write the full training process.
**Oneshot (weight-sharing) search approach:**
In this approach, users only need a oneshot trainer, because this trainer takes charge of both search and training.
Expand All @@ -163,10 +163,10 @@ In the following table, we listed the available trainers and strategies.
* - Trainer
- Strategy
- Oneshot Trainer
* - PyTorchImageClassificationTrainer
* - Classification
- TPEStrategy
- DartsTrainer
* - PyTorchMultiModelTrainer
* - Regression
- RandomStrategy
- EnasTrainer
* -
Expand All @@ -182,15 +182,20 @@ Here is a simple example of using trainer and strategy.
.. code-block:: python
trainer = PyTorchImageClassificationTrainer(base_model,
dataset_cls="MNIST",
dataset_kwargs={"root": "data/mnist", "download": True},
dataloader_kwargs={"batch_size": 32},
optimizer_kwargs={"lr": 1e-3},
trainer_kwargs={"max_epochs": 1})
simple_startegy = RandomStrategy()
import nni.retiarii.trainer.pytorch.lightning as pl
from nni.retiarii import blackbox
from torchvision import transforms
Users can refer to `this document <./WriteTrainer.rst>`__ for how to write a new trainer, and refer to `this document <./WriteStrategy.rst>`__ for how to write a new strategy.
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_dataset = blackbox(MNIST, root='data/mnist', train=True, download=True, transform=transform)
test_dataset = blackbox(MNIST, root='data/mnist', train=False, download=True, transform=transform)
lightning = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
max_epochs=10)
.. Note:: For NNI to capture the dataset and dataloader and distribute it across different runs, please wrap your dataset with ``blackbox`` and use ``pl.DataLoader`` instead of ``torch.utils.data.DataLoader``. See ``blackbox_module`` section below for details.
Users can refer to `API reference <./ApiReference.rst>`__ on detailed usage of trainer. "`write a trainer <./WriteTrainer.rst>`__" for how to write a new trainer, and refer to `this document <./WriteStrategy.rst>`__ for how to write a new strategy.
Set up an Experiment
^^^^^^^^^^^^^^^^^^^^
Expand Down
117 changes: 86 additions & 31 deletions docs/en_US/NAS/retiarii/WriteTrainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,59 +3,114 @@ Customize A New Trainer

Trainers are necessary to evaluate the performance of new explored models. In NAS scenario, this further divides into two use cases:

1. **Classic trainers**: trainers that are used to train and evaluate one single model.
1. **Single-arch trainers**: trainers that are used to train and evaluate one single model.
2. **One-shot trainers**: trainers that handle training and searching simultaneously, from an end-to-end perspective.

Classic trainers
----------------
Single-arch trainers
--------------------

All classic trainers need to inherit ``nni.retiarii.trainer.BaseTrainer``, implement the ``fit`` method and decorated with ``@register_trainer`` if it is intended to be used together with Retiarii. The decorator serialize the trainer that is used and its argument to fit for the requirements of NNI.
With PyTorch-Lightning
^^^^^^^^^^^^^^^^^^^^^^

The init function of trainer should take model as its first argument, and the rest of the arguments should be named (``*args`` and ``**kwargs`` may not work as expected) and JSON serializable. This means, currently, passing a complex object like ``torchvision.datasets.ImageNet()`` is not supported. Trainer should use NNI standard API to communicate with tuning algorithms. This includes ``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics.
It's recommended to write training code in PyTorch-Lightning style, that is, to write a LightningModule that defines all elements needed for training (e.g., loss function, optimizer) and to define a trainer that takes (optional) dataloaders to execute the training. Before that, please read the `document of PyTorch-lightning <https://pytorch-lightning.readthedocs.io/>` to learn the basic concepts and components provided by PyTorch-lightning.

In pratice, writing a new training module in NNI should inherit ``nni.retiarii.trainer.pytorch.lightning.LightningModule``, which has a ``set_model`` that will be called after ``__init__`` to save the candidate model (generated by strategy) as ``self.model``. The rest of the process (like ``training_step``) should be the same as writing any other lightning module. Trainers should also communicate with strategies via two API calls (``nni.report_intermediate_result`` for periodical metrics and ``nni.report_final_result`` for final metrics), added in ``on_validation_epoch_end`` and ``teardown`` respectively.

An example is as follows:

.. code-block::python
from nni.retiarii import register_trainer
from nni.retiarii.trainer import BaseTrainer
from nni.retiarii.trainer.pytorch.lightning import LightningModule # please import this one
@register_trainer
class MnistTrainer(BaseTrainer):
def __init__(self, model, optimizer_class_name='SGD', learning_rate=0.1):
@blackbox_module
class AutoEncoder(LightningModule):
def __init__(self):
super().__init__()
self.model = model
self.criterion = nn.CrossEntropyLoss()
self.train_dataset = MNIST(train=True)
self.valid_dataset = MNIST(train=False)
self.optimizer = getattr(torch.optim, optimizer_class_name)(lr=learning_rate)
def validate():
pass
def fit(self) -> None:
for i in range(10): # number of epochs:
for x, y in DataLoader(self.dataset):
self.optimizer.zero_grad()
pred = self.model(x)
loss = self.criterion(pred, y)
loss.backward()
self.optimizer.step()
acc = self.validate() # get validation accuracy
nni.report_final_result(acc)
self.decoder = nn.Sequential(
nn.Linear(3, 64),
nn.ReLU(),
nn.Linear(64, 28*28)
)
def forward(self, x):
embedding = self.model(x) # let's search for encoder
return embedding
def training_step(self, batch, batch_idx):
# training_step defined the train loop.
# It is independent of forward
x, y = batch
x = x.view(x.size(0), -1)
z = self.model(x) # model is the one that is searched for
x_hat = self.decoder(z)
loss = F.mse_loss(x_hat, x)
# Logging to TensorBoard by default
self.log('train_loss', loss)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
x = x.view(x.size(0), -1)
z = self.model(x)
x_hat = self.decoder(z)
loss = F.mse_loss(x_hat, x)
self.log('val_loss', loss)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
return optimizer
def on_validation_epoch_end(self):
nni.report_intermediate_result(self.trainer.callback_metrics['val_loss'].item())
def teardown(self, stage):
if stage == 'fit':
nni.report_final_result(self.trainer.callback_metrics['val_loss'].item())
Then, users need to wrap everything (including LightningModule, trainer and dataloaders) into a ``Lightning`` object, and pass this object into a Retiarii experiment.

.. code-block::python
import nni.retiarii.trainer.pytorch.lightning as pl
from nni.retiarii.experiment.pytorch import RetiariiExperiment
lightning = pl.Lightning(AutoEncoder(),
pl.Trainer(max_epochs=10),
train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
val_dataloaders=pl.DataLoader(test_dataset, batch_size=100))
experiment = RetiariiExperiment(base_model, lightning, mutators, strategy)
With FunctionalTrainer
^^^^^^^^^^^^^^^^^^^^^^

There is another way to customize a new trainer with functional APIs, which provides more flexibility. Users only need to write a fit function that wraps everything. This function takes one positional arguments (model) and possible keyword arguments. In this way, users get everything under their control, but exposes less information to the framework and thus fewer opportunities for possible optimization. An example is as belows:

.. code-block::python
from nni.retiarii.trainer import FunctionalTrainer
from nni.retiarii.experiment.pytorch import RetiariiExperiment
def fit(model, dataloader):
train(model, dataloader)
acc = test(model, dataloader)
nni.report_final_result(acc)
trainer = FunctionalTrainer(fit, dataloader=DataLoader(foo, bar))
experiment = RetiariiExperiment(base_model, trainer, mutators, strategy)
One-shot trainers
-----------------

One-shot trainers should inheirt ``nni.retiarii.trainer.BaseOneShotTrainer``, which is basically same as ``BaseTrainer``, but only with one extra method ``export()``, which is expected to return the searched best architecture.
One-shot trainers should inheirt ``nni.retiarii.trainer.BaseOneShotTrainer``, and need to implement ``fit()`` (used to conduct the fitting and searching process) and ``export()`` method (used to return the searched best architecture).

Writing a one-shot trainer is very different to classic trainers. First of all, there are no more restrictions on init method arguments, any Python arguments are acceptable. Secondly, the model feeded into one-shot trainers might be a model with Retiarii-specific modules, such as LayerChoice and InputChoice. Such model cannot directly forward-propagate and trainers need to decide how to handle those modules.

A typical example is DartsTrainer, where learnable-parameters are used to combine multiple choices in LayerChoice. Retiarii provides ease-to-use utility functions for module-replace purposes, namely ``replace_layer_choice``, ``replace_input_choice``. A simplified example is as follows:

.. code-block::python
from nni.retiarii.trainer import BaseOneShotTrainer
from nni.retiarii.trainer.pytorch import BaseOneShotTrainer
from nni.retiarii.trainer.pytorch.utils import replace_layer_choice, replace_input_choice
Expand Down
Loading