Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

support specifying gpus for tuner and advisor #1556

Merged
merged 6 commits into from
Sep 20, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 41 additions & 22 deletions docs/en_US/Tutorial/ExperimentConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ tuner:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
gpuIndices:
QuanluZhang marked this conversation as resolved.
Show resolved Hide resolved
trial:
command:
codeDir:
Expand Down Expand Up @@ -71,14 +71,13 @@ tuner:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
gpuIndices:
assessor:
#choice: Medianstop
builtinAssessorName:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
trial:
command:
codeDir:
Expand Down Expand Up @@ -113,14 +112,13 @@ tuner:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
gpuIndices:
assessor:
#choice: Medianstop
builtinAssessorName:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
trial:
command:
codeDir:
Expand Down Expand Up @@ -245,11 +243,11 @@ machineList:
* __builtinTunerName__ and __classArgs__
* __builtinTunerName__

__builtinTunerName__ specifies the name of system tuner, NNI sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__}
__builtinTunerName__ specifies the name of system tuner, NNI sdk provides different tuners introduced [here](../Tuner/BuiltinTuner.md).

* __classArgs__

__classArgs__ specifies the arguments of tuner algorithm. If the __builtinTunerName__ is in {__TPE__, __Random__, __Anneal__, __Evolution__}, user should set __optimize_mode__.
__classArgs__ specifies the arguments of tuner algorithm. Please refer to [this file](../Tuner/BuiltinTuner.md) for the configurable arguments of each built-in tuner.
* __codeDir__, __classFileName__, __className__ and __classArgs__
* __codeDir__

Expand All @@ -264,16 +262,16 @@ machineList:

__classArgs__ specifies the arguments of tuner algorithm.

* __gpuNum__

__gpuNum__ specifies the gpu number to run the tuner process. The value of this field should be a positive number. If the field is not set, NNI will not set `CUDA_VISIBLE_DEVICES` in script (that is, will not control the visibility of GPUs on trial command through `CUDA_VISIBLE_DEVICES`), and will not manage gpu resource.
* __gpuIndices__

Note: users could only specify one way to set tuner, for example, set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and could not set them both.
__gpuIndices__ specifies the gpus that can be used by the tuner process. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`. If the field is not set, `CUDA_VISIBLE_DEVICES` will be '' in script, that is, no GPU is visible to tuner.

* __includeIntermediateResults__

If __includeIntermediateResults__ is true, the last intermediate result of the trial that is early stopped by assessor is sent to tuner as final result. The default value of __includeIntermediateResults__ is false.

Note: users could only use one way to specify tuner, either specifying `builtinTunerName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`.

* __assessor__

* Description
Expand All @@ -282,7 +280,7 @@ machineList:
* __builtinAssessorName__ and __classArgs__
* __builtinAssessorName__

__builtinAssessorName__ specifies the name of system assessor, NNI sdk provides one kind of assessor {__Medianstop__}
__builtinAssessorName__ specifies the name of built-in assessor, NNI sdk provides different assessors introducted [here](../Assessor/BuiltinAssessor.md).
* __classArgs__

__classArgs__ specifies the arguments of assessor algorithm
Expand All @@ -305,11 +303,39 @@ machineList:

__classArgs__ specifies the arguments of assessor algorithm.

* __gpuNum__
Note: users could only use one way to specify assessor, either specifying `builtinAssessorName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`. If users do not want to use assessor, assessor fileld should leave to empty.

* __advisor__
* Description

__gpuNum__ specifies the gpu number to run the assessor process. The value of this field should be a positive number.
__advisor__ specifies the advisor algorithm in the experiment, there are two kinds of ways to specify advisor. One way is to use advisor provided by NNI sdk, need to set __builtinAdvisorName__ and __classArgs__. Another way is to use users' own advisor file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
* __builtinAdvisorName__ and __classArgs__
* __builtinAdvisorName__

Note: users' could only specify one way to set assessor, for example,set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and users could not set them both.If users do not want to use assessor, assessor fileld should leave to empty.
__builtinAdvisorName__ specifies the name of a built-in advisor, NNI sdk provides [different advisors](../Tuner/BuiltinTuner.md).

* __classArgs__

__classArgs__ specifies the arguments of the advisor algorithm. Please refer to [this file](../Tuner/BuiltinTuner.md) for the configurable arguments of each built-in advisor.
* __codeDir__, __classFileName__, __className__ and __classArgs__
* __codeDir__

__codeDir__ specifies the directory of advisor code.
* __classFileName__

__classFileName__ specifies the name of advisor file.
* __className__

__className__ specifies the name of advisor class.
* __classArgs__

__classArgs__ specifies the arguments of advisor algorithm.

* __gpuIndices__

__gpuIndices__ specifies the gpus that can be used by the tuner process. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`. If the field is not set, `CUDA_VISIBLE_DEVICES` will be '' in script, that is, no GPU is visible to tuner.

Note: users could only use one way to specify advisor, either specifying `builtinAdvisorName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`.

* __trial(local, remote)__

Expand Down Expand Up @@ -560,7 +586,6 @@ machineList:
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 mnist.py
codeDir: /nni/mnist
Expand All @@ -586,14 +611,12 @@ machineList:
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
assessor:
#choice: Medianstop
builtinAssessorName: Medianstop
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 mnist.py
codeDir: /nni/mnist
Expand All @@ -620,15 +643,13 @@ machineList:
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
assessor:
codeDir: /nni/assessor
classFileName: myassessor.py
className: MyAssessor
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 mnist.py
codeDir: /nni/mnist
Expand Down Expand Up @@ -656,7 +677,6 @@ machineList:
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 mnist.py
codeDir: /nni/mnist
Expand Down Expand Up @@ -780,7 +800,6 @@ machineList:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
gpuNum: 0
trial:
codeDir: .
worker:
Expand Down
2 changes: 2 additions & 0 deletions examples/trials/nas_cifar10/config_ppo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ tuner:
trials_per_update: 60
epochs_per_update: 12
minibatch_size: 10
#could the No. 0 gpu for this tuner
gpuIndices: 0
trial:
command: sh ./macro_cifar10.sh
codeDir: ./
Expand Down
11 changes: 11 additions & 0 deletions src/nni_manager/common/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,11 @@ function getMsgDispatcherCommand(tuner: any, assessor: any, advisor: any, multiP
if (advisor.classFileName !== undefined && advisor.classFileName.length > 1) {
command += ` --advisor_class_filename ${advisor.classFileName}`;
}
if (advisor.gpuIndices !== undefined) {
command = `CUDA_VISIBLE_DEVICES=${advisor.gpuIndices} ` + command;
} else {
command = `CUDA_VISIBLE_DEVICES='' ` + command;
}
} else {
command += ` --tuner_class_name ${tuner.className}`;
if (tuner.classArgs !== undefined) {
Expand All @@ -243,6 +248,12 @@ function getMsgDispatcherCommand(tuner: any, assessor: any, advisor: any, multiP
command += ` --assessor_class_filename ${assessor.classFileName}`;
}
}

if (tuner.gpuIndices !== undefined) {
command = `CUDA_VISIBLE_DEVICES=${tuner.gpuIndices} ` + command;
} else {
command = `CUDA_VISIBLE_DEVICES='' ` + command;
}
}

return command;
Expand Down
9 changes: 4 additions & 5 deletions src/nni_manager/rest_server/restValidationSchemas.ts
Original file line number Diff line number Diff line change
Expand Up @@ -170,26 +170,25 @@ export namespace ValidationSchemas {
classFileName: joi.string(),
className: joi.string(),
classArgs: joi.any(),
gpuNum: joi.number().min(0),
checkpointDir: joi.string().allow('')
checkpointDir: joi.string().allow(''),
gpuIndices: joi.string()
}),
tuner: joi.object({
builtinTunerName: joi.string().valid('TPE', 'Random', 'Anneal', 'Evolution', 'SMAC', 'BatchTuner', 'GridSearch', 'NetworkMorphism', 'MetisTuner', 'GPTuner', 'PPOTuner'),
codeDir: joi.string(),
classFileName: joi.string(),
className: joi.string(),
classArgs: joi.any(),
gpuNum: joi.number().min(0),
checkpointDir: joi.string().allow(''),
includeIntermediateResults: joi.boolean()
includeIntermediateResults: joi.boolean(),
gpuIndices: joi.string()
}),
assessor: joi.object({
builtinAssessorName: joi.string().valid('Medianstop', 'Curvefitting'),
codeDir: joi.string(),
classFileName: joi.string(),
className: joi.string(),
classArgs: joi.any(),
gpuNum: joi.number().min(0),
checkpointDir: joi.string().allow('')
}),
clusterMetaData: joi.array().items(joi.object({
Expand Down
1 change: 0 additions & 1 deletion src/sdk/pynni/nni/ppo_tuner/ppo_tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ class PPOTuner
"""

import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
import copy
import logging
import numpy as np
Expand Down
2 changes: 0 additions & 2 deletions test/naive_test/local.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,12 @@ tuner:
className: NaiveTuner
classArgs:
optimize_mode: maximize
gpuNum: 0
assessor:
codeDir: .
classFileName: naive_assessor.py
className: NaiveAssessor
classArgs:
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 naive_trial.py
codeDir: .
Expand Down
27 changes: 12 additions & 15 deletions tools/nni_cmd/config_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def setPathCheck(key):
'optimize_mode': setChoice('optimize_mode', 'maximize', 'minimize'),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
QuanluZhang marked this conversation as resolved.
Show resolved Hide resolved
},
('Evolution'): {
'builtinTunerName': setChoice('builtinTunerName', 'Evolution'),
Expand All @@ -85,12 +85,12 @@ def setPathCheck(key):
Optional('population_size'): setNumberRange('population_size', int, 0, 99999),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
('BatchTuner', 'GridSearch', 'Random'): {
'builtinTunerName': setChoice('builtinTunerName', 'BatchTuner', 'GridSearch', 'Random'),
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'TPE': {
'builtinTunerName': 'TPE',
Expand All @@ -100,7 +100,7 @@ def setPathCheck(key):
Optional('constant_liar_type'): setChoice('constant_liar_type', 'min', 'max', 'mean')
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'NetworkMorphism': {
'builtinTunerName': 'NetworkMorphism',
Expand All @@ -112,7 +112,7 @@ def setPathCheck(key):
Optional('n_output_node'): setType('n_output_node', int),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'MetisTuner': {
'builtinTunerName': 'MetisTuner',
Expand All @@ -124,7 +124,7 @@ def setPathCheck(key):
Optional('cold_start_num'): setType('cold_start_num', int),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'GPTuner': {
'builtinTunerName': 'GPTuner',
Expand All @@ -140,7 +140,7 @@ def setPathCheck(key):
Optional('selection_num_starting_points'): setType('selection_num_starting_points', int),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'PPOTuner': {
'builtinTunerName': 'PPOTuner',
Expand All @@ -158,15 +158,15 @@ def setPathCheck(key):
Optional('cliprange'): setType('cliprange', float),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'customized': {
'codeDir': setPathCheck('codeDir'),
'classFileName': setType('classFileName', str),
'className': setType('className', str),
Optional('classArgs'): dict,
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
}
}

Expand All @@ -178,7 +178,7 @@ def setPathCheck(key):
Optional('R'): setType('R', int),
Optional('eta'): setType('eta', int)
},
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'BOHB':{
'builtinAdvisorName': Or('BOHB'),
Expand All @@ -194,14 +194,14 @@ def setPathCheck(key):
Optional('bandwidth_factor'): setNumberRange('bandwidth_factor', float, 0, 9999),
Optional('min_bandwidth'): setNumberRange('min_bandwidth', float, 0, 9999),
},
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'customized':{
'codeDir': setPathCheck('codeDir'),
'classFileName': setType('classFileName', str),
'className': setType('className', str),
Optional('classArgs'): dict,
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
}
}

Expand All @@ -212,7 +212,6 @@ def setPathCheck(key):
Optional('optimize_mode'): setChoice('optimize_mode', 'maximize', 'minimize'),
Optional('start_step'): setNumberRange('start_step', int, 0, 9999),
},
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
},
'Curvefitting': {
'builtinAssessorName': 'Curvefitting',
Expand All @@ -223,14 +222,12 @@ def setPathCheck(key):
Optional('threshold'): setNumberRange('threshold', float, 0, 9999),
Optional('gap'): setNumberRange('gap', int, 1, 9999),
},
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
},
'customized': {
'codeDir': setPathCheck('codeDir'),
'classFileName': setType('classFileName', str),
'className': setType('className', str),
Optional('classArgs'): dict,
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999)
}
}

Expand Down
Loading