Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Commit

Permalink
support specifying gpus for tuner and advisor (#1556)
Browse files Browse the repository at this point in the history
* support specifying gpu for tuner and advisor
  • Loading branch information
QuanluZhang authored Sep 20, 2019
1 parent 04d2d7c commit 0b7d626
Show file tree
Hide file tree
Showing 8 changed files with 81 additions and 45 deletions.
63 changes: 41 additions & 22 deletions docs/en_US/Tutorial/ExperimentConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ tuner:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
gpuIndices:
trial:
command:
codeDir:
Expand Down Expand Up @@ -71,14 +71,13 @@ tuner:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
gpuIndices:
assessor:
#choice: Medianstop
builtinAssessorName:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
trial:
command:
codeDir:
Expand Down Expand Up @@ -113,14 +112,13 @@ tuner:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
gpuIndices:
assessor:
#choice: Medianstop
builtinAssessorName:
classArgs:
#choice: maximize, minimize
optimize_mode:
gpuNum:
trial:
command:
codeDir:
Expand Down Expand Up @@ -245,11 +243,11 @@ machineList:
* __builtinTunerName__ and __classArgs__
* __builtinTunerName__

__builtinTunerName__ specifies the name of system tuner, NNI sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__}
__builtinTunerName__ specifies the name of system tuner, NNI sdk provides different tuners introduced [here](../Tuner/BuiltinTuner.md).

* __classArgs__

__classArgs__ specifies the arguments of tuner algorithm. If the __builtinTunerName__ is in {__TPE__, __Random__, __Anneal__, __Evolution__}, user should set __optimize_mode__.
__classArgs__ specifies the arguments of tuner algorithm. Please refer to [this file](../Tuner/BuiltinTuner.md) for the configurable arguments of each built-in tuner.
* __codeDir__, __classFileName__, __className__ and __classArgs__
* __codeDir__

Expand All @@ -264,16 +262,16 @@ machineList:

__classArgs__ specifies the arguments of tuner algorithm.

* __gpuNum__

__gpuNum__ specifies the gpu number to run the tuner process. The value of this field should be a positive number. If the field is not set, NNI will not set `CUDA_VISIBLE_DEVICES` in script (that is, will not control the visibility of GPUs on trial command through `CUDA_VISIBLE_DEVICES`), and will not manage gpu resource.
* __gpuIndices__

Note: users could only specify one way to set tuner, for example, set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and could not set them both.
__gpuIndices__ specifies the gpus that can be used by the tuner process. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`. If the field is not set, `CUDA_VISIBLE_DEVICES` will be '' in script, that is, no GPU is visible to tuner.

* __includeIntermediateResults__

If __includeIntermediateResults__ is true, the last intermediate result of the trial that is early stopped by assessor is sent to tuner as final result. The default value of __includeIntermediateResults__ is false.

Note: users could only use one way to specify tuner, either specifying `builtinTunerName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`.

* __assessor__

* Description
Expand All @@ -282,7 +280,7 @@ machineList:
* __builtinAssessorName__ and __classArgs__
* __builtinAssessorName__

__builtinAssessorName__ specifies the name of system assessor, NNI sdk provides one kind of assessor {__Medianstop__}
__builtinAssessorName__ specifies the name of built-in assessor, NNI sdk provides different assessors introducted [here](../Assessor/BuiltinAssessor.md).
* __classArgs__

__classArgs__ specifies the arguments of assessor algorithm
Expand All @@ -305,11 +303,39 @@ machineList:

__classArgs__ specifies the arguments of assessor algorithm.

* __gpuNum__
Note: users could only use one way to specify assessor, either specifying `builtinAssessorName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`. If users do not want to use assessor, assessor fileld should leave to empty.

* __advisor__
* Description

__gpuNum__ specifies the gpu number to run the assessor process. The value of this field should be a positive number.
__advisor__ specifies the advisor algorithm in the experiment, there are two kinds of ways to specify advisor. One way is to use advisor provided by NNI sdk, need to set __builtinAdvisorName__ and __classArgs__. Another way is to use users' own advisor file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
* __builtinAdvisorName__ and __classArgs__
* __builtinAdvisorName__

Note: users' could only specify one way to set assessor, for example,set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and users could not set them both.If users do not want to use assessor, assessor fileld should leave to empty.
__builtinAdvisorName__ specifies the name of a built-in advisor, NNI sdk provides [different advisors](../Tuner/BuiltinTuner.md).

* __classArgs__

__classArgs__ specifies the arguments of the advisor algorithm. Please refer to [this file](../Tuner/BuiltinTuner.md) for the configurable arguments of each built-in advisor.
* __codeDir__, __classFileName__, __className__ and __classArgs__
* __codeDir__

__codeDir__ specifies the directory of advisor code.
* __classFileName__

__classFileName__ specifies the name of advisor file.
* __className__

__className__ specifies the name of advisor class.
* __classArgs__

__classArgs__ specifies the arguments of advisor algorithm.

* __gpuIndices__

__gpuIndices__ specifies the gpus that can be used by the tuner process. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`. If the field is not set, `CUDA_VISIBLE_DEVICES` will be '' in script, that is, no GPU is visible to tuner.

Note: users could only use one way to specify advisor, either specifying `builtinAdvisorName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`.

* __trial(local, remote)__

Expand Down Expand Up @@ -560,7 +586,6 @@ machineList:
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 mnist.py
codeDir: /nni/mnist
Expand All @@ -586,14 +611,12 @@ machineList:
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
assessor:
#choice: Medianstop
builtinAssessorName: Medianstop
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 mnist.py
codeDir: /nni/mnist
Expand All @@ -620,15 +643,13 @@ machineList:
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
assessor:
codeDir: /nni/assessor
classFileName: myassessor.py
className: MyAssessor
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 mnist.py
codeDir: /nni/mnist
Expand Down Expand Up @@ -656,7 +677,6 @@ machineList:
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 mnist.py
codeDir: /nni/mnist
Expand Down Expand Up @@ -780,7 +800,6 @@ machineList:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
gpuNum: 0
trial:
codeDir: .
worker:
Expand Down
3 changes: 3 additions & 0 deletions examples/trials/nas_cifar10/config_ppo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ tuner:
trials_per_update: 60
epochs_per_update: 12
minibatch_size: 10
#could use the No. 0 gpu for this tuner
#if want to specify multiple gpus, here is an example of specifying three gpus: 0,1,2
gpuIndices: 0
trial:
command: sh ./macro_cifar10.sh
codeDir: ./
Expand Down
11 changes: 11 additions & 0 deletions src/nni_manager/common/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,11 @@ function getMsgDispatcherCommand(tuner: any, assessor: any, advisor: any, multiP
if (advisor.classFileName !== undefined && advisor.classFileName.length > 1) {
command += ` --advisor_class_filename ${advisor.classFileName}`;
}
if (advisor.gpuIndices !== undefined) {
command = `CUDA_VISIBLE_DEVICES=${advisor.gpuIndices} ` + command;
} else {
command = `CUDA_VISIBLE_DEVICES='' ` + command;
}
} else {
command += ` --tuner_class_name ${tuner.className}`;
if (tuner.classArgs !== undefined) {
Expand All @@ -243,6 +248,12 @@ function getMsgDispatcherCommand(tuner: any, assessor: any, advisor: any, multiP
command += ` --assessor_class_filename ${assessor.classFileName}`;
}
}

if (tuner.gpuIndices !== undefined) {
command = `CUDA_VISIBLE_DEVICES=${tuner.gpuIndices} ` + command;
} else {
command = `CUDA_VISIBLE_DEVICES='' ` + command;
}
}

return command;
Expand Down
9 changes: 4 additions & 5 deletions src/nni_manager/rest_server/restValidationSchemas.ts
Original file line number Diff line number Diff line change
Expand Up @@ -170,26 +170,25 @@ export namespace ValidationSchemas {
classFileName: joi.string(),
className: joi.string(),
classArgs: joi.any(),
gpuNum: joi.number().min(0),
checkpointDir: joi.string().allow('')
checkpointDir: joi.string().allow(''),
gpuIndices: joi.string()
}),
tuner: joi.object({
builtinTunerName: joi.string().valid('TPE', 'Random', 'Anneal', 'Evolution', 'SMAC', 'BatchTuner', 'GridSearch', 'NetworkMorphism', 'MetisTuner', 'GPTuner', 'PPOTuner'),
codeDir: joi.string(),
classFileName: joi.string(),
className: joi.string(),
classArgs: joi.any(),
gpuNum: joi.number().min(0),
checkpointDir: joi.string().allow(''),
includeIntermediateResults: joi.boolean()
includeIntermediateResults: joi.boolean(),
gpuIndices: joi.string()
}),
assessor: joi.object({
builtinAssessorName: joi.string().valid('Medianstop', 'Curvefitting'),
codeDir: joi.string(),
classFileName: joi.string(),
className: joi.string(),
classArgs: joi.any(),
gpuNum: joi.number().min(0),
checkpointDir: joi.string().allow('')
}),
clusterMetaData: joi.array().items(joi.object({
Expand Down
1 change: 0 additions & 1 deletion src/sdk/pynni/nni/ppo_tuner/ppo_tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ class PPOTuner
"""

import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
import copy
import logging
import numpy as np
Expand Down
2 changes: 0 additions & 2 deletions test/naive_test/local.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,12 @@ tuner:
className: NaiveTuner
classArgs:
optimize_mode: maximize
gpuNum: 0
assessor:
codeDir: .
classFileName: naive_assessor.py
className: NaiveAssessor
classArgs:
optimize_mode: maximize
gpuNum: 0
trial:
command: python3 naive_trial.py
codeDir: .
Expand Down
27 changes: 12 additions & 15 deletions tools/nni_cmd/config_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def setPathCheck(key):
'optimize_mode': setChoice('optimize_mode', 'maximize', 'minimize'),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
('Evolution'): {
'builtinTunerName': setChoice('builtinTunerName', 'Evolution'),
Expand All @@ -85,12 +85,12 @@ def setPathCheck(key):
Optional('population_size'): setNumberRange('population_size', int, 0, 99999),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
('BatchTuner', 'GridSearch', 'Random'): {
'builtinTunerName': setChoice('builtinTunerName', 'BatchTuner', 'GridSearch', 'Random'),
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'TPE': {
'builtinTunerName': 'TPE',
Expand All @@ -100,7 +100,7 @@ def setPathCheck(key):
Optional('constant_liar_type'): setChoice('constant_liar_type', 'min', 'max', 'mean')
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'NetworkMorphism': {
'builtinTunerName': 'NetworkMorphism',
Expand All @@ -112,7 +112,7 @@ def setPathCheck(key):
Optional('n_output_node'): setType('n_output_node', int),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'MetisTuner': {
'builtinTunerName': 'MetisTuner',
Expand All @@ -124,7 +124,7 @@ def setPathCheck(key):
Optional('cold_start_num'): setType('cold_start_num', int),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'GPTuner': {
'builtinTunerName': 'GPTuner',
Expand All @@ -140,7 +140,7 @@ def setPathCheck(key):
Optional('selection_num_starting_points'): setType('selection_num_starting_points', int),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'PPOTuner': {
'builtinTunerName': 'PPOTuner',
Expand All @@ -158,15 +158,15 @@ def setPathCheck(key):
Optional('cliprange'): setType('cliprange', float),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'customized': {
'codeDir': setPathCheck('codeDir'),
'classFileName': setType('classFileName', str),
'className': setType('className', str),
Optional('classArgs'): dict,
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
}
}

Expand All @@ -178,7 +178,7 @@ def setPathCheck(key):
Optional('R'): setType('R', int),
Optional('eta'): setType('eta', int)
},
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'BOHB':{
'builtinAdvisorName': Or('BOHB'),
Expand All @@ -194,14 +194,14 @@ def setPathCheck(key):
Optional('bandwidth_factor'): setNumberRange('bandwidth_factor', float, 0, 9999),
Optional('min_bandwidth'): setNumberRange('min_bandwidth', float, 0, 9999),
},
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'customized':{
'codeDir': setPathCheck('codeDir'),
'classFileName': setType('classFileName', str),
'className': setType('className', str),
Optional('classArgs'): dict,
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
}
}

Expand All @@ -212,7 +212,6 @@ def setPathCheck(key):
Optional('optimize_mode'): setChoice('optimize_mode', 'maximize', 'minimize'),
Optional('start_step'): setNumberRange('start_step', int, 0, 9999),
},
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
},
'Curvefitting': {
'builtinAssessorName': 'Curvefitting',
Expand All @@ -223,14 +222,12 @@ def setPathCheck(key):
Optional('threshold'): setNumberRange('threshold', float, 0, 9999),
Optional('gap'): setNumberRange('gap', int, 1, 9999),
},
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999),
},
'customized': {
'codeDir': setPathCheck('codeDir'),
'classFileName': setType('classFileName', str),
'className': setType('className', str),
Optional('classArgs'): dict,
Optional('gpuNum'): setNumberRange('gpuNum', int, 0, 99999)
}
}

Expand Down
Loading

0 comments on commit 0b7d626

Please sign in to comment.