This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
TensorFlow 2.0 MNIST example, without IT #1790
Merged
Merged
Changes from 6 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
3f1fb77
Add MNIST example for TensorFlow v2.x (#1760)
liuzhe-lz 7df92bd
Merge branch 'master' into dev-tf2
liuzhe-lz 1cf5b83
test tf v1 example
aecc75d
change tf version back
07d336b
change batch size
43e66a8
fix typo
8edf4da
accept comment
b0213ef
remove docker-based configs
3e7e91d
prefer tfv1 in doc
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
authorName: NNI Example | ||
experimentName: MNIST TF v2.x | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
trainingServicePlatform: local # choices: local, remote, pai | ||
searchSpacePath: search_space.json | ||
useAnnotation: false | ||
tuner: | ||
builtinTunerName: TPE # choices: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, | ||
# GPTuner, SMAC (SMAC should be installed through nnictl) | ||
classArgs: | ||
optimize_mode: maximize # choices: maximize, minimize | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
gpuNum: 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
authorName: NNI Example | ||
experimentName: MNIST TF v2.x with assessor | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 50 | ||
#choice: local, remote | ||
trainingServicePlatform: local | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
assessor: | ||
#choice: Medianstop, Curvefitting | ||
builtinAssessorName: Curvefitting | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
epoch_num: 20 | ||
threshold: 0.9 | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
gpuNum: 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
authorName: NNI Example | ||
experimentName: MNIST TF v2.x | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai, kubeflow | ||
trainingServicePlatform: frameworkcontroller | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
assessor: | ||
builtinAssessorName: Medianstop | ||
classArgs: | ||
optimize_mode: maximize | ||
gpuNum: 0 | ||
trial: | ||
codeDir: . | ||
taskRoles: | ||
- name: worker | ||
taskNum: 1 | ||
command: python3 mnist.py | ||
gpuNum: 1 | ||
cpuNum: 1 | ||
memoryMB: 8192 | ||
image: msranni/nni:latest | ||
frameworkAttemptCompletionPolicy: | ||
minFailedTaskCount: 1 | ||
minSucceededTaskCount: 1 | ||
frameworkcontrollerConfig: | ||
storage: nfs | ||
nfs: | ||
# Your NFS server IP, like 10.10.10.10 | ||
server: {your_nfs_server_ip} | ||
# Your NFS server export path, like /var/nfs/nni | ||
path: {your_nfs_server_export_path} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
authorName: NNI Example | ||
experimentName: MNIST TF v2.x | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 1 | ||
#choice: local, remote, pai, kubeflow | ||
trainingServicePlatform: kubeflow | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
codeDir: . | ||
worker: | ||
replicas: 1 | ||
command: python3 mnist.py | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8192 | ||
image: msranni/nni:latest | ||
kubeflowConfig: | ||
operator: tf-operator | ||
apiVersion: v1alpha2 | ||
storage: nfs | ||
nfs: | ||
server: 10.10.10.10 | ||
path: /var/nfs/general |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
authorName: NNI Example | ||
experimentName: MNIST TF v2.x | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: msranni/nni:latest | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
authorName: NNI Example | ||
experimentName: MNIST TF v2.x | ||
trialConcurrency: 1 | ||
maxExecDuration: 1h | ||
maxTrialNum: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: local | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
tuner: | ||
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner | ||
#SMAC (SMAC should be installed through nnictl) | ||
builtinTunerName: TPE | ||
classArgs: | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python mnist.py | ||
codeDir: . | ||
gpuNum: 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT license. | ||
|
||
""" | ||
NNI example trial code. | ||
|
||
- Experiment type: Hyper-parameter Optimization | ||
- Trial framework: Tensorflow v2.x (Keras API) | ||
- Model: LeNet-5 | ||
- Dataset: MNIST | ||
""" | ||
|
||
import logging | ||
|
||
import tensorflow as tf | ||
from tensorflow.keras import Model | ||
from tensorflow.keras.callbacks import Callback | ||
from tensorflow.keras.layers import (Conv2D, Dense, Dropout, Flatten, MaxPool2D) | ||
from tensorflow.keras.optimizers import Adam | ||
|
||
import nni | ||
|
||
_logger = logging.getLogger('mnist_example') | ||
_logger.setLevel(logging.INFO) | ||
|
||
|
||
class MnistModel(Model): | ||
""" | ||
LeNet-5 Model with customizable hyper-parameters | ||
""" | ||
def __init__(self, conv_size, hidden_size, dropout_rate): | ||
""" | ||
Initialize hyper-parameters. | ||
|
||
Parameters | ||
---------- | ||
conv_size : int | ||
Kernel size of convolutional layers. | ||
hidden_size : int | ||
Dimensionality of last hidden layer. | ||
dropout_rate : float | ||
Dropout rate between two fully connected (dense) layers, to prevent co-adaptation. | ||
""" | ||
super().__init__() | ||
self.conv1 = Conv2D(filters=32, kernel_size=conv_size, activation='relu') | ||
self.pool1 = MaxPool2D(pool_size=2) | ||
self.conv2 = Conv2D(filters=64, kernel_size=conv_size, activation='relu') | ||
self.pool2 = MaxPool2D(pool_size=2) | ||
self.flatten = Flatten() | ||
self.fc1 = Dense(units=hidden_size, activation='relu') | ||
self.dropout = Dropout(rate=dropout_rate) | ||
self.fc2 = Dense(units=10, activation='softmax') | ||
|
||
def call(self, x): | ||
"""Override ``Model.call`` to build LeNet-5 model.""" | ||
x = self.conv1(x) | ||
x = self.pool1(x) | ||
x = self.conv2(x) | ||
x = self.pool2(x) | ||
x = self.flatten(x) | ||
x = self.fc1(x) | ||
x = self.dropout(x) | ||
return self.fc2(x) | ||
|
||
|
||
class ReportIntermediates(Callback): | ||
""" | ||
Callback class for reporting intermediate accuracy metrics. | ||
|
||
This callback sends accuracy to NNI framework every 100 steps, | ||
so you can view the learning curve on web UI. | ||
|
||
If an assessor is configured in experiment's YAML file, | ||
it will use these metrics for early stopping. | ||
""" | ||
def on_epoch_end(self, epoch, logs=None): | ||
"""Reports intermediate accuracy to NNI framework""" | ||
# TensorFlow 2.0 API reference claims the key is `val_acc`, but in fact it's `val_accuracy` | ||
if 'val_acc' in logs: | ||
nni.report_intermediate_result(logs['val_acc']) | ||
else: | ||
nni.report_intermediate_result(logs['val_accuracy']) | ||
|
||
|
||
def load_dataset(): | ||
"""Download and reformat MNIST dataset""" | ||
mnist = tf.keras.datasets.mnist | ||
(x_train, y_train), (x_test, y_test) = mnist.load_data() | ||
x_train, x_test = x_train / 255.0, x_test / 255.0 | ||
x_train = x_train[..., tf.newaxis] | ||
x_test = x_test[..., tf.newaxis] | ||
return (x_train, y_train), (x_test, y_test) | ||
|
||
|
||
def main(params): | ||
""" | ||
Main program: | ||
- Build network | ||
- Prepare dataset | ||
- Train the model | ||
- Report accuracy to tuner | ||
""" | ||
model = MnistModel( | ||
conv_size=params['conv_size'], | ||
hidden_size=params['hidden_size'], | ||
dropout_rate=params['dropout_rate'] | ||
) | ||
optimizer = Adam(learning_rate=params['learning_rate']) | ||
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy']) | ||
_logger.info('Model built') | ||
|
||
(x_train, y_train), (x_test, y_test) = load_dataset() | ||
_logger.info('Dataset loaded') | ||
|
||
model.fit( | ||
x_train, | ||
y_train, | ||
batch_size=params['batch_size'], | ||
epochs=10, | ||
verbose=0, | ||
callbacks=[ReportIntermediates()], | ||
validation_data=(x_test, y_test) | ||
) | ||
_logger.info('Training completed') | ||
|
||
result = model.evaluate(x_test, y_test, verbose=0) | ||
liuzhe-lz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
nni.report_final_result(result[1]) # send final accuracy to NNI tuner and web UI | ||
_logger.info('Final accuracy reported: %s', result[1]) | ||
|
||
|
||
if __name__ == '__main__': | ||
params = { | ||
'dropout_rate': 0.5, | ||
'conv_size': 5, | ||
'hidden_size': 1024, | ||
'batch_size': 32, | ||
'learning_rate': 1e-4, | ||
} | ||
|
||
# fetch hyper-parameters from HPO tuner | ||
# comment out following two lines to run the code without NNI framework | ||
tuned_params = nni.get_next_parameter() | ||
params.update(tuned_params) | ||
|
||
_logger.info('Hyper-parameters: %s', params) | ||
main(params) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"dropout_rate": { "_type": "uniform", "_value": [0.5, 0.9] }, | ||
"conv_size": { "_type": "choice", "_value": [2, 3, 5, 7] }, | ||
"hidden_size": { "_type": "choice", "_value": [124, 512, 1024] }, | ||
"batch_size": { "_type": "choice", "_value": [16, 32] }, | ||
"learning_rate": { "_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1] } | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensorflow 1.x is installed in msranni/nni right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Because nobody is upgrading it.
I'll remove this config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is it appropriate for us to put an example here that doesn't even work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed docker-based config files. Now this example does work, but only supports limited training services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because distributed platforms are not supported, I moved this example down in docs...