Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

TensorFlow 2.0 MNIST example, without IT #1790

Merged
merged 9 commits into from
Nov 26, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
<li><b>Examples</b></li>
<ul>
<li><a href="examples/trials/mnist-pytorch">MNIST-pytorch</li></a>
<li><a href="examples/trials/mnist">MNIST-tensorflow</li></a>
<li><a href="examples/trials/mnist-tfv2">MNIST-tensorflow</li></a>
<li><a href="examples/trials/mnist-keras">MNIST-keras</li></a>
<li><a href="docs/en_US/TrialExample/GbdtExample.md">Auto-gbdt</a></li>
<li><a href="docs/en_US/TrialExample/Cifar10Examples.md">Cifar10-pytorch</li></a>
Expand Down Expand Up @@ -245,15 +245,15 @@ Linux and MacOS
* Run the MNIST example.

```bash
nnictl create --config nni/examples/trials/mnist/config.yml
nnictl create --config nni/examples/trials/mnist-tfv2/config.yml
```

Windows

* Run the MNIST example.

```bash
nnictl create --config nni\examples\trials\mnist\config_windows.yml
nnictl create --config nni\examples\trials\mnist-tfv2\config_windows.yml
```

* Wait for the message `INFO: Successfully started experiment!` in the command line. This message indicates that your experiment has been successfully started. You can explore the experiment using the `Web UI url`.
Expand Down
18 changes: 13 additions & 5 deletions docs/en_US/TrialExample/MnistExamples.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

CNN MNIST classifier for deep learning is similar to `hello world` for programming languages. Thus, we use MNIST as example to introduce different features of NNI. The examples are listed below:

- [MNIST with NNI API](#mnist)
- [MNIST with NNI API (TensorFlow v2.x)](#mnist-tfv2)
- [MNIST with NNI API (TensorFlow v1.x)](#mnist-tfv1)
- [MNIST with NNI annotation](#mnist-annotation)
- [MNIST in keras](#mnist-keras)
- [MNIST -- tuning with batch tuner](#mnist-batch)
Expand All @@ -11,12 +12,19 @@ CNN MNIST classifier for deep learning is similar to `hello world` for programmi
- [distributed MNIST (tensorflow) using kubeflow](#mnist-kubeflow-tf)
- [distributed MNIST (pytorch) using kubeflow](#mnist-kubeflow-pytorch)

<a name="mnist"></a>
**MNIST with NNI API**
<a name="mnist-tfv2"></a>
**MNIST with NNI API (TensorFlow v2.x)**

This is a simple network which has two convolutional layers, two pooling layers and a fully connected layer. We tune hyperparameters, such as dropout rate, convolution size, hidden size, etc. It can be tuned with most NNI built-in tuners, such as TPE, SMAC, Random. We also provide an exmaple YAML file which enables assessor.

`code directory: examples/trials/mnist/`
`code directory: examples/trials/mnist-tfv2/`

<a name="mnist-tfv1"></a>
**MNIST with NNI API (TensorFlow v1.x)**

Same network to the example above, but written in TensorFlow v1.x API.

`code directory: examples/trials/mnist-tfv1/`

<a name="mnist-annotation"></a>
**MNIST with NNI annotation**
Expand Down Expand Up @@ -65,4 +73,4 @@ This example is to show how to run distributed training on kubeflow through NNI.

Similar to the previous example, the difference is that this example is implemented in pytorch, thus, it uses kubeflow pytorch operator.

`code directory: examples/trials/mnist-distributed-pytorch/`
`code directory: examples/trials/mnist-distributed-pytorch/`
17 changes: 17 additions & 0 deletions examples/trials/mnist-tfv2/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
authorName: NNI Example
experimentName: MNIST TF v2.x
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
trainingServicePlatform: local # choices: local, remote, pai
searchSpacePath: search_space.json
useAnnotation: false
tuner:
builtinTunerName: TPE # choices: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner,
# GPTuner, SMAC (SMAC should be installed through nnictl)
classArgs:
optimize_mode: maximize # choices: maximize, minimize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
29 changes: 29 additions & 0 deletions examples/trials/mnist-tfv2/config_assessor.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
authorName: NNI Example
experimentName: MNIST TF v2.x with assessor
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 50
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
assessor:
#choice: Medianstop, Curvefitting
builtinAssessorName: Curvefitting
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
epoch_num: 20
threshold: 0.9
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
41 changes: 41 additions & 0 deletions examples/trials/mnist-tfv2/config_frameworkcontroller.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
authorName: NNI Example
experimentName: MNIST TF v2.x
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai, kubeflow
trainingServicePlatform: frameworkcontroller
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
assessor:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
gpuNum: 0
trial:
codeDir: .
taskRoles:
- name: worker
taskNum: 1
command: python3 mnist.py
gpuNum: 1
cpuNum: 1
memoryMB: 8192
image: msranni/nni:latest
frameworkAttemptCompletionPolicy:
minFailedTaskCount: 1
minSucceededTaskCount: 1
frameworkcontrollerConfig:
storage: nfs
nfs:
# Your NFS server IP, like 10.10.10.10
server: {your_nfs_server_ip}
# Your NFS server export path, like /var/nfs/nni
path: {your_nfs_server_export_path}
32 changes: 32 additions & 0 deletions examples/trials/mnist-tfv2/config_kubeflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
authorName: NNI Example
experimentName: MNIST TF v2.x
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 1
#choice: local, remote, pai, kubeflow
trainingServicePlatform: kubeflow
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
codeDir: .
worker:
replicas: 1
command: python3 mnist.py
gpuNum: 0
cpuNum: 1
memoryMB: 8192
image: msranni/nni:latest
kubeflowConfig:
operator: tf-operator
apiVersion: v1alpha2
storage: nfs
nfs:
server: 10.10.10.10
path: /var/nfs/general
32 changes: 32 additions & 0 deletions examples/trials/mnist-tfv2/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
authorName: NNI Example
experimentName: MNIST TF v2.x
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: msranni/nni:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensorflow 1.x is installed in msranni/nni right?

Copy link
Contributor Author

@liuzhe-lz liuzhe-lz Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Because nobody is upgrading it.
I'll remove this config file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is it appropriate for us to put an example here that doesn't even work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed docker-based config files. Now this example does work, but only supports limited training services.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because distributed platforms are not supported, I moved this example down in docs...

paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
21 changes: 21 additions & 0 deletions examples/trials/mnist-tfv2/config_windows.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
authorName: NNI Example
experimentName: MNIST TF v2.x
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python mnist.py
codeDir: .
gpuNum: 0
146 changes: 146 additions & 0 deletions examples/trials/mnist-tfv2/mnist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""
NNI example trial code.

- Experiment type: Hyper-parameter Optimization
- Trial framework: Tensorflow v2.x (Keras API)
- Model: LeNet-5
- Dataset: MNIST
"""

import logging

import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.callbacks import Callback
from tensorflow.keras.layers import (Conv2D, Dense, Dropout, Flatten, MaxPool2D)
from tensorflow.keras.optimizers import Adam

import nni

_logger = logging.getLogger('mnist_example')
_logger.setLevel(logging.INFO)


class MnistModel(Model):
"""
LeNet-5 Model with customizable hyper-parameters
"""
def __init__(self, conv_size, hidden_size, dropout_rate):
"""
Initialize hyper-parameters.

Parameters
----------
conv_size : int
Kernel size of convolutional layers.
hidden_size : int
Dimensionality of last hidden layer.
dropout_rate : float
Dropout rate between two fully connected (dense) layers, to prevent co-adaptation.
"""
super().__init__()
self.conv1 = Conv2D(filters=32, kernel_size=conv_size, activation='relu')
self.pool1 = MaxPool2D(pool_size=2)
self.conv2 = Conv2D(filters=64, kernel_size=conv_size, activation='relu')
self.pool2 = MaxPool2D(pool_size=2)
self.flatten = Flatten()
self.fc1 = Dense(units=hidden_size, activation='relu')
self.dropout = Dropout(rate=dropout_rate)
self.fc2 = Dense(units=10, activation='softmax')

def call(self, x):
"""Override ``Model.call`` to build LeNet-5 model."""
x = self.conv1(x)
x = self.pool1(x)
x = self.conv2(x)
x = self.pool2(x)
x = self.flatten(x)
x = self.fc1(x)
x = self.dropout(x)
return self.fc2(x)


class ReportIntermediates(Callback):
"""
Callback class for reporting intermediate accuracy metrics.

This callback sends accuracy to NNI framework every 100 steps,
so you can view the learning curve on web UI.

If an assessor is configured in experiment's YAML file,
it will use these metrics for early stopping.
"""
def on_epoch_end(self, epoch, logs=None):
"""Reports intermediate accuracy to NNI framework"""
# TensorFlow 2.0 API reference claims the key is `val_acc`, but in fact it's `val_accuracy`
if 'val_acc' in logs:
nni.report_intermediate_result(logs['val_acc'])
else:
nni.report_intermediate_result(logs['val_accuracy'])


def load_dataset():
"""Download and reformat MNIST dataset"""
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]
return (x_train, y_train), (x_test, y_test)


def main(params):
"""
Main program:
- Build network
- Prepare dataset
- Train the model
- Report accuracy to tuner
"""
model = MnistModel(
conv_size=params['conv_size'],
hidden_size=params['hidden_size'],
dropout_rate=params['dropout_rate']
)
optimizer = Adam(learning_rate=params['learning_rate'])
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
_logger.info('Model built')

(x_train, y_train), (x_test, y_test) = load_dataset()
_logger.info('Dataset loaded')

model.fit(
x_train,
y_train,
batch_size=params['batch_size'],
epochs=10,
verbose=0,
callbacks=[ReportIntermediates()],
validation_data=(x_test, y_test)
)
_logger.info('Training completed')

result = model.evaluate(x_test, y_test, verbose=0)
liuzhe-lz marked this conversation as resolved.
Show resolved Hide resolved
nni.report_final_result(result[1]) # send final accuracy to NNI tuner and web UI
_logger.info('Final accuracy reported: %s', result[1])


if __name__ == '__main__':
params = {
'dropout_rate': 0.5,
'conv_size': 5,
'hidden_size': 1024,
'batch_size': 32,
'learning_rate': 1e-4,
}

# fetch hyper-parameters from HPO tuner
# comment out following two lines to run the code without NNI framework
tuned_params = nni.get_next_parameter()
params.update(tuned_params)

_logger.info('Hyper-parameters: %s', params)
main(params)
7 changes: 7 additions & 0 deletions examples/trials/mnist-tfv2/search_space.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"dropout_rate": { "_type": "uniform", "_value": [0.5, 0.9] },
"conv_size": { "_type": "choice", "_value": [2, 3, 5, 7] },
"hidden_size": { "_type": "choice", "_value": [124, 512, 1024] },
"batch_size": { "_type": "choice", "_value": [16, 32] },
"learning_rate": { "_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1] }
}
Loading