Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List agents #155

Closed
7 tasks done
windsource opened this issue Jan 16, 2024 · 8 comments
Closed
7 tasks done

List agents #155

windsource opened this issue Jan 16, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request. Issue will appear in the change log "Features"
Milestone

Comments

@windsource
Copy link
Contributor

windsource commented Jan 16, 2024

Description

Currently there is not CLI command to list the agents that are connected to the server. This is required to quickly see if all agents could connect to the server and are available to run workloads.

Goals

Add a command ank get agents to list all agents.

Final result

Summary

Implemented new Ankaios CLI command showing Ankaios agents currently connected to the Ankaios cluster with their amount of workloads part of workloadStates field of the CompleteState:

> ank -k get agents

NAME      WORKLOADS
agent_A   3
agent_B   0
agent_C   1

The connected Ankaios agents has been added to the CompleteState as well, inside a field called agents. This ensures that workload can request the connected agents over the control interface. The agents field is an associative data structure containing the agent names as keys and another associative data structure as values for future usage. In the values some agent attributes can be populated in the future, e.g. resource statistics of an agent like mentioned in #282.

Counting the workloads having a workload state inside the CompleteState instead of counting the workloads assigned to an agent in the desiredState for each agent handles the situation correctly that a workload is deleted from the desiredState but the deletion is not scheduled yet, e.g. because of not ready inter-workload dependencies.

Tasks

@inf17101
Copy link
Contributor

inf17101 commented Aug 2, 2024

I have updated the task list above to contain all necessary steps.

@inf17101
Copy link
Contributor

inf17101 commented Aug 5, 2024

@windsource, @krucod3: I have added some prototypes and questions below:

Do we need a table output like "kubectl get nodes" outputs?

> kubectl get nodes

NAME       STATUS   ROLES    AGE     VERSION
node-1     Ready    <none>   7d      v1.21.3
node-2     Ready    <none>   7d      v1.21.3
node-3     Ready    <none>   7d      v1.21.3

and to fill the table output with Ankaios related data:

> ank -k get agents

AGENT NAME   ASSIGNED WORKLOADS
agent_A      3
agent_B      0
agent_C      1

"ASSIGNED WORKLOADS" means that the workloads are just assigned in the desiredState to those agents, it means not that the workloads are already handled by those agents. The server already has this information about the assigned workloads as it maintains the CompleteState internally. The CLI can fetch the CompleteState to extract this information.

or do you had something in mind without a table output and just simple listed agent names like this:

> ank -k get agents

agent_A
agent_B
agent_C

Next I had a discussion with @krucod3 and we think both that it would be also useful to have the "list agents" feature for a workload, too. Meaning, a workload can request the agent list over the control interface to check which agents are available.

The agents can be listed inside the desiredState, then it is usable over the control interface together with the authorization feature.

New example desiredState where agent_A and agent_B are conntected to the Ankaios server and the Ankaios agent agent_C not:

desiredState:
  apiVersion: v0.1
  workloads:
    hello1:
      agent: agent_B
      tags: []
      dependencies:
        filesystem_init: ADD_COND_SUCCEEDED
      restartPolicy: NEVER
      runtime: podman
      runtimeConfig: |
        image: alpine:latest
        commandOptions: [ "--rm"]
        commandArgs: [ "echo", "Hello Ankaios"]
    hello2:
      agent: agent_B
      tags:
      - key: owner
        value: Ankaios team
      dependencies: {}
      restartPolicy: ON_FAILURE
      runtime: podman
      runtimeConfig: |
        image: alpine:latest
        commandOptions: [ "--entrypoint", "/bin/sh" ]
        commandArgs: [ "-c", "echo 'Restarted on failure.'; sleep 2"]
    hello3:
      agent: agent_C
      tags: []
      dependencies:
        filesystem_init: ADD_COND_SUCCEEDED
      restartPolicy: NEVER
      runtime: podman
      runtimeConfig: |
        image: alpine:latest
        commandArgs: [ "echo", "Hello Ankaios"]
    nginx:
      agent: agent_A
      tags:
      - key: owner
        value: Ankaios team
      dependencies: {}
      restartPolicy: ON_FAILURE
      runtime: podman
      runtimeConfig: |
        image: docker.io/nginx:latest
        commandOptions: ["-p", "8081:80"]
workloadStates:
  agent_C:
    hello3:
      2aef335312f4e87e60b2b2f91befbe556aaf5159fe0e6a1b804e80a95acf1b1e:
        state: Pending
        subState: Initial
        additionalInfo: ''
  agent_A:
    nginx:
      7d6ea2b79cea1e401beee1553a9d3d7b5bcbb37f1cfdb60db1fbbcaa140eb17d:
        state: Running
        subState: Ok
        additionalInfo: ''
  agent_B:
    hello2:
      50dfcc07f50a94f113f33fe1329eeebce93d134176d90b9a8ef3c2cfa15d1a7b:
        state: Succeeded
        subState: Ok
        additionalInfo: ''
    hello1:
      9f4dce2c90669cdcbd2ef8eddb4e38d6238abf721bbebffd820121ce1633f705:
        state: Pending
        subState: WaitingToStart
        additionalInfo: ''
agents:
  agent_A: {}
  agent_B: {}

The "agents" has to be a map data structure for further enhancements in the future like discussed with @krucod3.

@krucod3
Copy link
Contributor

krucod3 commented Aug 6, 2024

The agent information shall definitely be provided over the complete state. This way we can use the already existing features for filtering and authorization.
With #282 we already plan adding more data for each agent, so starting with an empty dict for now is the way to go.
As for the output of the CLI, I think it definitely makes sense to provide a table as the one for get workloads. A simple list would just be an intermediate solution and as we already plan to enrich with additional data (#282) it would make sense to directly start so.
The code used for the get workloads feature can probably be extracted and reused here. I would also suggest to directly add (or at least consider during the development depending on the effort) the watch feature for the get agents as it is already planned for get workloads with #228.

@windsource
Copy link
Contributor Author

@inf17101 and @krucod3

I like the table output for ank get agents but for better readability I would reduce a column header to a single word. If required the column types can be explained in ank get agents -h.

So I would prefer:

> ank get agents
NAME      WORKLOADS
agent_A   3
agent_B   0
agent_C   1

I also think that the agents should become part of the complete state such that workloads can access that information. An empty object for every agent as described above is fine for me.

@inf17101
Copy link
Contributor

inf17101 commented Aug 7, 2024

@inf17101 and @krucod3

I like the table output for ank get agents but for better readability I would reduce a column header to a single word. If required the column types can be explained in ank get agents -h.

So I would prefer:

> ank get agents
NAME      WORKLOADS
agent_A   3
agent_B   0
agent_C   1

I also think that the agents should become part of the complete state such that workloads can access that information. An empty object for every agent as described above is fine for me.

Ok, I will implement it like this.

@inf17101
Copy link
Contributor

inf17101 commented Aug 16, 2024

@krucod3: Just out of curiosity, I tested with two raspberry pies if we detect an Ankaios agent as disconnected when I plug of the network cable on a node running an Ankaios agent (no graceful disconnect of an agent). The result is that we do not detect this as agent disconnected within the Ankaios server. In this case the execution state of those workloads managed by the disconnected agent are not set to ExecutionState::AgentDisconnected. And we cannot remove the agent from the CompleteState.

I think we cannot detect this as we are not sending something like heartbeats.

@inf17101
Copy link
Contributor

I have changed the code to use the workloadStates field inside the CompleteState to count the number of workloads for each agent since it ensures the correct number of workloads for each agent even if a workload has been deleted from the CompleteState but the workload not yet scheduled for deletion.

@inf17101
Copy link
Contributor

Command implemented, see in the issue description "final result" for more details. PR was reviewed and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. Issue will appear in the change log "Features"
Projects
None yet
Development

No branches or pull requests

3 participants