Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[2022‐03‐24] WebSocket for dispatcher

cruiseliu edited this page Mar 23, 2022 · 4 revisions

Background

Historically NNI manager uses anonymous pipes for IPC with HPO tuner, and named pipes for IPC with NAS strategy.

It uses different mechanisms due to the booting order:

  • For HPO, tuner is a child process of NNI manager. Node.js child_process library provides the ability to create anonymous pipes and pass them to children, which works well.
  • For NAS, strategy is the parent process of NNI manager. Unfortunately, Python do not have such library to create anonymous pipe for children processes. So we have to find another way.

Named pipe, at first glance, is the most similar replacement of anonymous pipe. However, it turns out to be really problematic.

The biggest problem is that named pipe is platform dependent. Named pipes in Windows and POSIX have totally different behaviors. And it is a pain to program and debug with low level Windows APIs.

WebSocket

WebSocket, on the other hand, is a well defined standard and is supported by all modern systems and languages, which fills the world with rainbows and unicorns.

Since WebSocket is also a stream based channel, the high level NNI protocol has no need to change.

Now IPC between NNI manager and Python process works in this way:

  1. Define a WebSocket URL.
  2. NNI manager initiates a WebSocket server (on top of the REST server) to serve the URL.
  3. Python client wait NNI manager to boot, and then connects to the URL.
  4. Everything is done. Now they can do duplex communication.

The only difference between HPO and NAS is that:

  1. Tuner connects in __main__.
  2. Strategy connects after Experiment.start().

The URL

The URL has been decided to be ws://localhost:{port}/{url-prefix}/tuner. But I would rather say it's a recommended practice than a protocol.

"Abstract interface" without real code has proved to be terrible design. New contributors can only see hard coded strings and have no idea where they come from.

Therefore, in HPO this URL is generated by NNI manager, and is sent to tuner via environment variable NNI_TUNER_COMMAND_CHANNEL.

NAS will need something different I guess. Maybe it will be an argv replacing dispatcher_pipe.

In Future

Dispatcher and "reusable" TrialKeeper now use exactly the same mechanism. They can, and should, share the same infrastructure library, just picking different URLs.