Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Multiphase job hangs when search space is exhuasted with NoMoreTrialError raised #1204

Closed
chicm-ms opened this issue Jun 25, 2019 · 1 comment
Assignees
Labels
bug Something isn't working nnidev

Comments

@chicm-ms
Copy link
Contributor

chicm-ms commented Jun 25, 2019

Short summary about the issue/question:

Brief what process you are following:

How to reproduce it:
Batch tuner and Gridsearch tuner raises NoMoreTrialError when search space exhuasted, but nni does not handle it properly for multiphase jobs.

I reproduced it using nni/test/config_test/multi_phase/multi_phase_batch.test.yml.
For example using a small search space with batch tuner:

{
    "test":
    {
        "_type" : "choice",
        "_value" : [1, 100]
    }
}

Then request more than 2 trials, multiphase job hangs.

[06/25/2019, 11:33:28 AM] DEBUG (nni.msg_dispatcher_base/Thread-1) process_command: command: [CommandType.ReportMetricData], data: [OrderedDict([('type', 'REQUEST_PARAMETER'), ('sequence', 0), ('parameter_index', 1), ('trial_job_id', 'xl6az')])]
[06/25/2019, 11:33:28 AM] ERROR (nni.msg_dispatcher_base/Thread-1) no more parameters now.
Traceback (most recent call last):
File "/home/quzha/anaconda3/envs/nni/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 102, in command_queue_worker
self.process_command(command, data)
File "/home/quzha/anaconda3/envs/nni/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 160, in process_command
command_handlerscommand
File "/home/quzha/anaconda3/envs/nni/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 146, in handle_report_metric_data
param = self.tuner.generate_parameters(param_id, trial_job_id=data['trial_job_id'])
File "/home/quzha/anaconda3/envs/nni/lib/python3.7/site-packages/nni/batch_tuner/batch_tuner.py", line 90, in generate_parameters
raise nni.NoMoreTrialError('no more parameters now.')
nni.NoMoreTrialError: no more parameters now.

nni Environment:

  • nni version:
  • nni mode(local|pai|remote):
  • OS:
  • python version:
  • is conda or virtualenv used?:
  • is running in docker?:

need to update document(yes/no):

Anything else we need to know:

@chicm-ms chicm-ms added the bug Something isn't working label Jun 25, 2019
@chicm-ms chicm-ms changed the title Dispatcher does not handle NoMoreTrialError properly Multiphase job hangs when search space is exhuasted with NoMoreTrialError raised Jun 25, 2019
@scarlett2018 scarlett2018 added this to the July 2019 Release milestone Jul 3, 2019
@scarlett2018 scarlett2018 assigned chicm-ms and unassigned Crysple Sep 9, 2019
@rabbit008
Copy link
Contributor

#1539 fixed this issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working nnidev
Projects
None yet
Development

No branches or pull requests

4 participants