List agents #155

windsource · 2024-01-16T10:07:31Z

Description

Currently there is not CLI command to list the agents that are connected to the server. This is required to quickly see if all agents could connect to the server and are available to run workloads.

Goals

Add a command ank get agents to list all agents.

Final result

Summary

Implemented new Ankaios CLI command showing Ankaios agents currently connected to the Ankaios cluster with their amount of workloads part of workloadStates field of the CompleteState:

> ank -k get agents

NAME      WORKLOADS
agent_A   3
agent_B   0
agent_C   1

The connected Ankaios agents has been added to the CompleteState as well, inside a field called agents. This ensures that workload can request the connected agents over the control interface. The agents field is an associative data structure containing the agent names as keys and another associative data structure as values for future usage. In the values some agent attributes can be populated in the future, e.g. resource statistics of an agent like mentioned in #282.

Counting the workloads having a workload state inside the CompleteState instead of counting the workloads assigned to an agent in the desiredState for each agent handles the situation correctly that a workload is deleted from the desiredState but the deletion is not scheduled yet, e.g. because of not ready inter-workload dependencies.

Tasks

Share a prototype in the issue about how the cli output looks like
Create swdd requirements
Create stests
Implement the agent list inside the server since the server needs to know which agents are connected (needed to resolve the bug of stucking ank cli wait mode)
Implement the command in the ank cli
Extend and fix utests
Unblock the bug tickets Ank CLI set state wait mode stucks when a new state deletes not initially started workloads #320 and Agent provides basic node resource availability #282

The text was updated successfully, but these errors were encountered:

inf17101 · 2024-08-02T14:21:45Z

I have updated the task list above to contain all necessary steps.

inf17101 · 2024-08-05T09:54:57Z

@windsource, @krucod3: I have added some prototypes and questions below:

Do we need a table output like "kubectl get nodes" outputs?

> kubectl get nodes

NAME       STATUS   ROLES    AGE     VERSION
node-1     Ready    <none>   7d      v1.21.3
node-2     Ready    <none>   7d      v1.21.3
node-3     Ready    <none>   7d      v1.21.3

and to fill the table output with Ankaios related data:

> ank -k get agents

AGENT NAME   ASSIGNED WORKLOADS
agent_A      3
agent_B      0
agent_C      1

"ASSIGNED WORKLOADS" means that the workloads are just assigned in the desiredState to those agents, it means not that the workloads are already handled by those agents. The server already has this information about the assigned workloads as it maintains the CompleteState internally. The CLI can fetch the CompleteState to extract this information.

or do you had something in mind without a table output and just simple listed agent names like this:

> ank -k get agents

agent_A
agent_B
agent_C

Next I had a discussion with @krucod3 and we think both that it would be also useful to have the "list agents" feature for a workload, too. Meaning, a workload can request the agent list over the control interface to check which agents are available.

The agents can be listed inside the desiredState, then it is usable over the control interface together with the authorization feature.

New example desiredState where agent_A and agent_B are conntected to the Ankaios server and the Ankaios agent agent_C not:

desiredState:
  apiVersion: v0.1
  workloads:
    hello1:
      agent: agent_B
      tags: []
      dependencies:
        filesystem_init: ADD_COND_SUCCEEDED
      restartPolicy: NEVER
      runtime: podman
      runtimeConfig: |
        image: alpine:latest
        commandOptions: [ "--rm"]
        commandArgs: [ "echo", "Hello Ankaios"]
    hello2:
      agent: agent_B
      tags:
      - key: owner
        value: Ankaios team
      dependencies: {}
      restartPolicy: ON_FAILURE
      runtime: podman
      runtimeConfig: |
        image: alpine:latest
        commandOptions: [ "--entrypoint", "/bin/sh" ]
        commandArgs: [ "-c", "echo 'Restarted on failure.'; sleep 2"]
    hello3:
      agent: agent_C
      tags: []
      dependencies:
        filesystem_init: ADD_COND_SUCCEEDED
      restartPolicy: NEVER
      runtime: podman
      runtimeConfig: |
        image: alpine:latest
        commandArgs: [ "echo", "Hello Ankaios"]
    nginx:
      agent: agent_A
      tags:
      - key: owner
        value: Ankaios team
      dependencies: {}
      restartPolicy: ON_FAILURE
      runtime: podman
      runtimeConfig: |
        image: docker.io/nginx:latest
        commandOptions: ["-p", "8081:80"]
workloadStates:
  agent_C:
    hello3:
      2aef335312f4e87e60b2b2f91befbe556aaf5159fe0e6a1b804e80a95acf1b1e:
        state: Pending
        subState: Initial
        additionalInfo: ''
  agent_A:
    nginx:
      7d6ea2b79cea1e401beee1553a9d3d7b5bcbb37f1cfdb60db1fbbcaa140eb17d:
        state: Running
        subState: Ok
        additionalInfo: ''
  agent_B:
    hello2:
      50dfcc07f50a94f113f33fe1329eeebce93d134176d90b9a8ef3c2cfa15d1a7b:
        state: Succeeded
        subState: Ok
        additionalInfo: ''
    hello1:
      9f4dce2c90669cdcbd2ef8eddb4e38d6238abf721bbebffd820121ce1633f705:
        state: Pending
        subState: WaitingToStart
        additionalInfo: ''
agents:
  agent_A: {}
  agent_B: {}

The "agents" has to be a map data structure for further enhancements in the future like discussed with @krucod3.

krucod3 · 2024-08-06T06:44:44Z

The agent information shall definitely be provided over the complete state. This way we can use the already existing features for filtering and authorization.
With #282 we already plan adding more data for each agent, so starting with an empty dict for now is the way to go.
As for the output of the CLI, I think it definitely makes sense to provide a table as the one for get workloads. A simple list would just be an intermediate solution and as we already plan to enrich with additional data (#282) it would make sense to directly start so.
The code used for the get workloads feature can probably be extracted and reused here. I would also suggest to directly add (or at least consider during the development depending on the effort) the watch feature for the get agents as it is already planned for get workloads with #228.

windsource · 2024-08-07T07:56:12Z

@inf17101 and @krucod3

I like the table output for ank get agents but for better readability I would reduce a column header to a single word. If required the column types can be explained in ank get agents -h.

So I would prefer:

> ank get agents
NAME      WORKLOADS
agent_A   3
agent_B   0
agent_C   1

I also think that the agents should become part of the complete state such that workloads can access that information. An empty object for every agent as described above is fine for me.

inf17101 · 2024-08-07T07:59:38Z

@inf17101 and @krucod3

I like the table output for ank get agents but for better readability I would reduce a column header to a single word. If required the column types can be explained in ank get agents -h.

So I would prefer:
> ank get agents
NAME      WORKLOADS
agent_A   3
agent_B   0
agent_C   1
I also think that the agents should become part of the complete state such that workloads can access that information. An empty object for every agent as described above is fine for me.

Ok, I will implement it like this.

inf17101 · 2024-08-16T09:47:55Z

@krucod3: Just out of curiosity, I tested with two raspberry pies if we detect an Ankaios agent as disconnected when I plug of the network cable on a node running an Ankaios agent (no graceful disconnect of an agent). The result is that we do not detect this as agent disconnected within the Ankaios server. In this case the execution state of those workloads managed by the disconnected agent are not set to ExecutionState::AgentDisconnected. And we cannot remove the agent from the CompleteState.

I think we cannot detect this as we are not sending something like heartbeats.

inf17101 · 2024-08-26T15:03:45Z

I have changed the code to use the workloadStates field inside the CompleteState to count the number of workloads for each agent since it ensures the correct number of workloads for each agent even if a workload has been deleted from the CompleteState but the workload not yet scheduled for deletion.

inf17101 · 2024-08-30T13:52:51Z

Command implemented, see in the issue description "final result" for more details. PR was reviewed and merged.

windsource added the enhancement New feature or request. Issue will appear in the change log "Features" label Jan 16, 2024

krucod3 added this to the backlog milestone Jan 25, 2024

krucod3 modified the milestones: backlog, v0.5 May 2, 2024

krucod3 mentioned this issue Jun 6, 2024

Agent provides basic node resource availability #282

Closed

3 tasks

inf17101 mentioned this issue Jul 4, 2024

Unknown Ankaios agents listed in Add Workload input form FelixMoelders/ankaios-dashboard#24

Closed

inf17101 mentioned this issue Jul 26, 2024

Ank CLI set state wait mode stucks when a new state deletes not initially started workloads #320

Closed

inf17101 self-assigned this Aug 2, 2024

inf17101 mentioned this issue Aug 6, 2024

Add new Ankaios cli command ank get agents #343

Merged

1 task

inf17101 closed this as completed Aug 30, 2024

inf17101 mentioned this issue Sep 4, 2024

Requirements linkage fixes for list agents #366

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List agents #155

List agents #155

windsource commented Jan 16, 2024 •

edited by inf17101

Loading

inf17101 commented Aug 2, 2024

inf17101 commented Aug 5, 2024 •

edited

Loading

krucod3 commented Aug 6, 2024

windsource commented Aug 7, 2024

inf17101 commented Aug 7, 2024

inf17101 commented Aug 16, 2024 •

edited

Loading

inf17101 commented Aug 26, 2024

inf17101 commented Aug 30, 2024

List agents #155

List agents #155

Comments

windsource commented Jan 16, 2024 • edited by inf17101 Loading

Description

Goals

Final result

Summary

Tasks

inf17101 commented Aug 2, 2024

inf17101 commented Aug 5, 2024 • edited Loading

krucod3 commented Aug 6, 2024

windsource commented Aug 7, 2024

inf17101 commented Aug 7, 2024

inf17101 commented Aug 16, 2024 • edited Loading

inf17101 commented Aug 26, 2024

inf17101 commented Aug 30, 2024

windsource commented Jan 16, 2024 •

edited by inf17101

Loading

inf17101 commented Aug 5, 2024 •

edited

Loading

inf17101 commented Aug 16, 2024 •

edited

Loading