Run an Experiment on Multiple Machines

NNI supports running an experiment on multiple machines through SSH channel, called remote mode. NNI assumes that you have access to those machines, and already setup the environment for running deep learning training code.

e.g. Three machines and you login in with account bob (Note: the account is not necessarily the same on different machine):

IP	Username	Password
10.1.1.1	bob	bob123
10.1.1.2	bob	bob123
10.1.1.3	bob	bob123

Setup NNI environment

Install NNI on each of your machines following the install guide here.

Run an experiment

Install NNI on another machine which has network accessibility to those three machines above, or you can just use any machine above to run nnictl command line tool.

We use examples/trials/mnist-annotation as an example here. cat ~/nni/examples/trials/mnist-annotation/config_remote.yml to see the detailed configuration file:

authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
#choice: true, false
useAnnotation: true
tuner:
  #choice: TPE, Random, Anneal, Evolution, BatchTuner
  #SMAC (SMAC should be installed through nnictl)
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
trial:
  command: python3 mnist.py
  codeDir: .
  gpuNum: 0
#machineList can be empty if the platform is local
machineList:
  - ip: 10.1.1.1
    username: bob
    passwd: bob123
    #port can be skip if using default ssh port 22
    #port: 22
  - ip: 10.1.1.2
    username: bob
    passwd: bob123
  - ip: 10.1.1.3
    username: bob
    passwd: bob123

Simply filling the machineList section and then run:

nnictl create --config ~/nni/examples/trials/mnist-annotation/config_remote.yml

to start the experiment.

version check

NNI support version check feature in since version 0.6, refer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RemoteMachineMode.md

RemoteMachineMode.md

Run an Experiment on Multiple Machines

Setup NNI environment

Run an experiment

version check

Files

RemoteMachineMode.md

Latest commit

History

RemoteMachineMode.md

File metadata and controls

Run an Experiment on Multiple Machines

Setup NNI environment

Run an experiment

version check