-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[2022‐03‐24] WebSocket for dispatcher
Historically NNI manager uses anonymous pipes for IPC with HPO tuner, and named pipes for IPC with NAS strategy.
It uses different mechanisms due to the booting order:
- For HPO, tuner is a child process of NNI manager. Node.js
child_process
library provides the ability to create anonymous pipes and pass them to children, which works well. - For NAS, strategy is the parent process of NNI manager. Unfortunately, Python do not have such library to create anonymous pipe for children processes. So we have to find another way.
Named pipe, at first glance, is the most similar replacement of anonymous pipe. However, it turns out to be really problematic.
The biggest problem is that named pipe is platform dependent. Named pipes in Windows and POSIX have totally different behaviors. And it is a pain to program and debug with low level Windows APIs.
WebSocket, on the other hand, is a well defined standard and is supported by all modern systems and languages, which fills the world with rainbows and unicorns.
Since WebSocket is also a stream based channel, the high level NNI protocol has no need to change.
Now IPC between NNI manager and Python process works in this way:
- Define a WebSocket URL.
- NNI manager initiates a WebSocket server (on top of the REST server) to serve the URL.
- Python client wait NNI manager to boot, and then connects to the URL.
- Everything is done. Now they can do duplex communication.
The only difference between HPO and NAS is that:
- Tuner connects in
__main__
. - Strategy connects after
Experiment.start()
.
The URL has been decided to be ws://localhost:{port}/{url-prefix}/tuner
.
But I would rather say it's a recommended practice than a protocol.
"Abstract interface" without real code has proved to be terrible design. New contributors can only see hard coded strings and have no idea where they come from.
Therefore, in HPO this URL is generated by NNI manager,
and is sent to tuner via environment variable NNI_TUNER_COMMAND_CHANNEL
.
NAS will need something different I guess. Maybe it will be an argv replacing dispatcher_pipe
.
Dispatcher and "reusable" TrialKeeper now use exactly the same mechanism. They can, and should, share the same infrastructure library, just picking different URLs.
This wiki is a journal that tracks the development of NNI. It's not guaranteed to be up-to-date. Read NNI documentation for latest information: https://nni.readthedocs.io/en/latest/