Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API versioning #14

Closed
windsource opened this issue Sep 5, 2023 · 11 comments
Closed

API versioning #14

windsource opened this issue Sep 5, 2023 · 11 comments
Assignees
Labels
enhancement New feature or request. Issue will appear in the change log "Features"
Milestone

Comments

@windsource
Copy link
Contributor

Reported by @lingnoi:

Description

Currently the communication between agent/cli and server is done using the API definition in ankaios.proto. The same is used for the communication through the control interface.
Now consider the following problem:

  • Theoretically it is possible that different agents with different version of the API are running in production
  • It's not able to send error status through the message responses

Goals

  • Concept to tackle the mention problem is created

Tasks

tbd.

@windsource windsource added the enhancement New feature or request. Issue will appear in the change log "Features" label Sep 5, 2023
@windsource windsource added this to the backlog milestone Sep 5, 2023
@krucod3
Copy link
Contributor

krucod3 commented Sep 21, 2023

An idea would be to send the version with the AgentHello and decline agent connections with an unsupported version.

@windsource
Copy link
Contributor Author

As a first step we separate the communication between server and agent on the one hand and the control interface on the other hand in #172.

@windsource
Copy link
Contributor Author

Currently there is a very strange error message when the version of the ank CLI does not match server and agent:

❯ ./ank get workloads
error: Failed to get workloads: 'Command failed: 'Failed to get complete state.
Error: Channel preliminary closed.''

In this case server and agent are v0.3.1 and ank CLI is main (~0.4.0). I would expect an error message that tells me that the version does not match and in best case also which version of the ank CLI I have to use to connect to the server.

Especially when there are different ways of installing server/agent and CLI this problem gets severe (e.g. when using #308).

@krucod3
Copy link
Contributor

krucod3 commented Jul 11, 2024

We at least need a concept here.

@krucod3 krucod3 modified the milestones: backlog, v0.5 Jul 11, 2024
@krucod3 krucod3 self-assigned this Sep 10, 2024
@krucod3
Copy link
Contributor

krucod3 commented Sep 10, 2024

For agents the version can be added to the AgentHello message as the connection is maintained. For the CLI this will not work, as the message is not sent there and the connection is opened per request.

Option 1:

We could add a CliHello message which is required to be sent on every CLI connection (before every request).
(Probably the message should be named differently as the API used by the CLI could also be used by third party applications. The name CliHello was taken as an example here to make it more obvious which interface we are talking about.)
Drawback: additional time needed for the CLI connection

Option 2:

The version can be added to the ToServer and FromServer messages. This way we cover both the CLI and the Agents.
Drawback: The version will be sent on every request adding unnecessary network traffic for the agents.

@krucod3
Copy link
Contributor

krucod3 commented Sep 11, 2024

Current CLI behavior

The ank CLI currently opens a connection when a command is triggered (for command line completions a separate process is currently started which is also opening a connection). If the command has a watch functionality, e.g., ank apply the connection is maintained until the desired result is achieved, or the command is interrupted.

Possible third party apps docking at the server command interface (CLI interface currently)

If a third party app docks at the API to which the CLI currently connects (can be done if one wants to control Ankaios from the outside rather then from the inside using the Control Interface) the connection would also be maintained until the third party app exits (which is probably when the system is going down).

Decision about versioning for the server command interface

Comparing option 1 and 2 from above and taking into account how connections are used to control the Ankaios server from outside of the cluster, it seems like option 1 sending a hello at the start of each connection makes more sense here in regards of resource usage, especially in production environments.

Control Interface versioning

As the users of the Control Interface would probably also be a small subset of workloads that are connected all the time and regularly send/receive data, it would also make more sense to add a single, initial Hello message instead of a version with every request.
As we are already starting work on the SDKs, the programmatical overhead of adding this message would be handled by the SDKs and will not even influence (most) the developers.

Hello as a request?

Since it seems like all connections to Ankaios need an initial Hello messages, it would make sense to make this a part of the Request object in ank_base.proto. The server has no problem to differentiate the 3 different Hellos only from the context, but beside the version, they may carry additional information:

  • Agent Hello requires the agent name and later maybe attributes of an agent (e.g., hw: [gpu, ssd])
  • Workload Hello
  • CLI/third-party-app (Commander) Hello could carry a name for debugging purposes

For the sake of simplicity we could leave the name as a required attribute of a single Hello message.

Having the Hello as part of the request-response model would also solve some weird issues we had while introducing the ServerHello message with #367.

Version comparison

One easy way to start here (at least at the start), would be to compare the Ankaios release versions. This would also spare us having to manage too many versions of different parts of the system.
Major version differences shall definitely be regarded as incompatible and patch version differences only as compatible.
Minor version differences are a bit more complicated. Per definition a minor version change should still be compatible‡, so we have to take care that this is the case. As the CompleteState already has a version on it's own, we can disregard it here and consider that new config options for workloads are handled elsewhere. This leaves us with taking care only that the protocol itself has not changed in a minor version update.

‡ Note that we are at v0 at the moment and thigs a bit different. For now we will need to compare minor versions only and regard them as incompatible. This behavior needs to be updated once the v1 major release is done. I would propose to create a ticket about this release and track there all changes needed for the release in order to ensure all works properly afterwards.

Behavior upon incompatible version detection

Connections from incompatible agents, CLIs and third-party-apps can just be rejected with a corresponding error message.
Unlike with incompatible agents/CLIs/third-party-apps, workloads cannot just be stopped because of an incompatibility. An error message shall be sent in all cases, but to prevent undefined behavior resulting from the incompatible versions, Ankaios could also close the Control Interface instance in such a case. This would give a workload the possibility to report the incompatibility home in the worst case.

@krucod3
Copy link
Contributor

krucod3 commented Sep 11, 2024

@christoph-hamm, @windsource, maybe you can have a look at the concept above and share your thoughts.

@windsource
Copy link
Contributor Author

Although I think that adding the version to the ToServer and FromServer message would be acceptable, using a initial Hello message is also fine for me. Also using the Ankaios version and then compare versions using the SemVer scheme sounds good.

@krucod3
Copy link
Contributor

krucod3 commented Sep 16, 2024

As we are not expecting an response with every hello, I'll just add two more separate objects for the CLI and the workloads. The code remains much cleaner this way, otherwise we would need some funny workarounds for the edge cases.

@krucod3
Copy link
Contributor

krucod3 commented Sep 16, 2024

I also tried to work with the semver crate for the versions, but as we need to convert multiple times from and to proto, I have to represent the version in proto:

  • either as string and make version error handling on multiple places (because of the conversions)
  • or as a message with multiple integers and a string (for the pre) in order to avoid the parsing errors, but I end up adding a lot of complexity to the message and wasting more space then with a single string ...

I'll just go one step back, regard the version as string all over the place and check compatibility with a regex. This is kind of old-school, but pragmatic when taking into account the current code base.

@krucod3
Copy link
Contributor

krucod3 commented Oct 29, 2024

All done.

@krucod3 krucod3 closed this as completed Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. Issue will appear in the change log "Features"
Projects
None yet
Development

No branches or pull requests

2 participants