Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node resource availability #378

Merged
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
980468a
initial commit
HorjuRares Sep 12, 2024
7c8846a
Add Resource Measurement
HorjuRares Sep 13, 2024
f19d6e1
More useless things
HorjuRares Sep 17, 2024
10acac6
Delete agent/src/resource_measurement/resource_measurement_sender
HorjuRares Sep 17, 2024
d52f8d1
Sending of AgentResource
HorjuRares Sep 17, 2024
2cb7343
Add the actual measurement
HorjuRares Sep 18, 2024
e953922
Add data load to resource availability message
HorjuRares Sep 19, 2024
b3c0b46
Add resource availability in ank get agents
HorjuRares Sep 20, 2024
1642df2
Measure only free memory
HorjuRares Sep 23, 2024
e8f1924
Finish impl and start adding unit tests
HorjuRares Sep 26, 2024
eba7e62
Add unit tests
HorjuRares Sep 26, 2024
dc80538
Fix unit tests
HorjuRares Sep 27, 2024
381d54f
Update documentation
HorjuRares Sep 30, 2024
db90f29
Rename node resource availability message
HorjuRares Oct 1, 2024
8f0516f
Merge branch 'main' into 282_node_resource_availability
HorjuRares Oct 1, 2024
217f9f5
Merge branch 'eclipse-ankaios:main' into 282_node_resource_availability
HorjuRares Oct 1, 2024
d37599b
Update AgentMap
HorjuRares Oct 7, 2024
8a9a2ab
Fix failing unit tests
HorjuRares Oct 7, 2024
bad40bc
Update docs
HorjuRares Oct 7, 2024
3574b0f
Update ank/doc/swdesign/README.md
HorjuRares Oct 7, 2024
fc740de
Merge branch 'eclipse-ankaios:main' into 282_node_resource_availability
HorjuRares Oct 7, 2024
7552bc2
Update resource availability swdd
HorjuRares Oct 7, 2024
aff5340
Update agent resource availability swdd
HorjuRares Oct 7, 2024
026d059
Update agent manager unit tests
HorjuRares Oct 8, 2024
333d1ad
Refactor the node resource availability
HorjuRares Oct 9, 2024
ae3b096
Update doc and test cases
HorjuRares Oct 10, 2024
7fd17ec
Update server/doc/swdesign/README.md
HorjuRares Oct 10, 2024
b95ce56
Update filtering
HorjuRares Oct 10, 2024
5e2e217
Update filtering
HorjuRares Oct 10, 2024
c363453
Update stests and documentation
HorjuRares Oct 11, 2024
8afc34a
Apply suggestions from code review - Documentation
HorjuRares Oct 14, 2024
7887aec
Update docs and agent_manager
HorjuRares Oct 14, 2024
2e854a7
Update server/doc/swdesign/README.md
HorjuRares Oct 14, 2024
178548f
Update server/doc/swdesign/README.md
HorjuRares Oct 14, 2024
0ec9807
Fix match assertion and PERCENTAGE_BASE
HorjuRares Oct 14, 2024
5174275
Improve get agents readability
HorjuRares Oct 14, 2024
1510706
Update doc
HorjuRares Oct 14, 2024
31851b6
Rename CpuLoad to CpuUsage
HorjuRares Oct 14, 2024
49b777b
Update requirements versions
HorjuRares Oct 15, 2024
7ec20dc
Update messages
HorjuRares Oct 15, 2024
85d09e6
Fix CLI format and trace messages
HorjuRares Oct 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 123 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions agent/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ serde_json = "1.0"
uuid = { version = "1.3", features = ["v4", "fast-rng"] }
sha256 = "1.5"
umask = "2.1.0"
sysinfo = "0.31"
regex = "1.10"

[dev-dependencies]
Expand Down
16 changes: 16 additions & 0 deletions agent/doc/swdesign/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2759,6 +2759,22 @@ Needs:
- impl
- utest

#### AgentManager sends the node resource availability to the server
`swdd~agent-sends-node-resource-availability-to-server~1`

Status: approved

When the AgentManager receives the workload states of the workload it manages, then the AgentManager shall send an `AgentLoadStatus` message to the Ankaios server, containing the available resources.
krucod3 marked this conversation as resolved.
Show resolved Hide resolved

Rationale: The Ankaios server workloads scheduler shall necessitate in the future the knowledge of available resources.
HorjuRares marked this conversation as resolved.
Show resolved Hide resolved

Tags:
- AgentManager

Needs:
- impl
- utest

### Forwarding the Control Interface

The Ankaios Agent is responsible to forward Control Interface requests from a Workload to the Ankaios Server and to forward Control Interface responses from the Ankaios Server to the Workload.
Expand Down
81 changes: 78 additions & 3 deletions agent/src/agent_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@
// under the License.
//
// SPDX-License-Identifier: Apache-2.0
use sysinfo::{CpuRefreshKind, MemoryRefreshKind, RefreshKind, System};

use common::{
commands::AgentLoadStatus,
from_server_interface::{FromServer, FromServerReceiver},
objects::WorkloadState,
objects::{AgentLoad, WorkloadState},
std_extensions::{GracefulExitResult, IllegalStateResult},
to_server_interface::{ToServerInterface, ToServerSender},
};
Expand All @@ -25,6 +27,9 @@ use crate::workload_state::workload_state_store::WorkloadStateStore;
#[cfg_attr(test, mockall_double::double)]
use crate::runtime_manager::RuntimeManager;
use crate::workload_state::WorkloadStateReceiver;

const RESOURCE_MEASUREMENT_INTERVAL_TICK: std::time::Duration = tokio::time::Duration::from_secs(2);

// [impl->swdd~agent-shall-use-interfaces-to-server~1]
pub struct AgentManager {
agent_name: String,
Expand Down Expand Up @@ -56,6 +61,9 @@ impl AgentManager {

pub async fn start(&mut self) {
log::info!("Awaiting commands from the server ...");

let mut interval = tokio::time::interval(RESOURCE_MEASUREMENT_INTERVAL_TICK);

loop {
tokio::select! {
// [impl->swdd~agent-manager-listens-requests-from-server~1]
Expand All @@ -67,15 +75,18 @@ impl AgentManager {
if self.execute_from_server_command(from_server).await.is_none() {
break;
}
}
},
// [impl->swdd~agent-manager-receives-workload-states-of-its-workloads~1]
workload_state = self.workload_state_receiver.recv() => {
let workload_state = workload_state
.ok_or("Channel to listen to own workload states closed.".to_string())
.unwrap_or_exit("Abort");

self.store_and_forward_own_workload_states(workload_state).await;
}
// [impl->swdd~agent-sends-node-resource-availability-to-server~1]
_ = interval.tick() => {
self.measure_and_forward_resource_availability().await;
}
}
}
}
Expand Down Expand Up @@ -192,6 +203,38 @@ impl AgentManager {
.await
.unwrap_or_illegal_state();
}

// [impl->swdd~agent-sends-node-resource-availability-to-server~1]
async fn measure_and_forward_resource_availability(&mut self) {
let mut sys = System::new_with_specifics(
RefreshKind::new()
.with_cpu(CpuRefreshKind::everything())
.with_memory(MemoryRefreshKind::everything()),
);

sys.refresh_all();

let cpu_usage: u32 = (sys.global_cpu_usage() * 100.0) as u32;
let free_memory = sys.free_memory();

log::trace!(
"Agent '{}' reports resource usage: CPU: {:.2}%, Free Memory: {} MB",
self.agent_name,
cpu_usage,
free_memory,
);

self.to_server
.agent_load_status(AgentLoadStatus {
agent_name: self.agent_name.clone(),
agent_resources: AgentLoad {
cpu_usage,
free_memory,
},
})
.await
.unwrap_or_illegal_state();
}
}

//////////////////////////////////////////////////////////////////////////////
Expand Down Expand Up @@ -501,4 +544,36 @@ mod tests {
to_manager.stop().await.unwrap();
assert!(join!(handle).0.is_ok());
}

// [utest->swdd~agent-sends-node-resource-availability-to-server~1]
#[tokio::test]
async fn utest_agent_manager_sends_available_resources() {
krucod3 marked this conversation as resolved.
Show resolved Hide resolved
let _guard = crate::test_helper::MOCKALL_CONTEXT_SYNC
.get_lock_async()
.await;

let mock_wl_state_store = MockWorkloadStateStore::default();
mock_parameter_storage_new_returns(mock_wl_state_store);

let (to_manager, manager_receiver) = channel(BUFFER_SIZE);
let (to_server, mut server_receiver) = channel(BUFFER_SIZE);
let (_workload_state_sender, workload_state_receiver) = channel(BUFFER_SIZE);
let mut mock_runtime_manager = RuntimeManager::default();
mock_runtime_manager.expect_handle_update_workload().never();

let mut agent_manager = AgentManager::new(
AGENT_NAME.to_string(),
manager_receiver,
mock_runtime_manager,
to_server,
workload_state_receiver,
);

let handle = tokio::spawn(async move { agent_manager.start().await });

assert!(server_receiver.recv().await.is_some());

to_manager.stop().await.unwrap();
assert!(join!(handle).0.is_ok());
}
}
5 changes: 2 additions & 3 deletions agent/src/runtime_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -538,7 +538,7 @@ mod tests {
generate_test_control_interface_access,
generate_test_workload_spec_with_control_interface_access,
generate_test_workload_spec_with_dependencies, generate_test_workload_spec_with_param,
AddCondition, WorkloadInstanceNameBuilder, WorkloadState,
AddCondition, AgentAttributes, WorkloadInstanceNameBuilder, WorkloadState,
};
use common::test_utils::{
self, generate_test_complete_state, generate_test_deleted_workload,
Expand Down Expand Up @@ -1947,9 +1947,8 @@ mod tests {
});

complete_state.agents = Some(ank_base::AgentMap {
agents: HashMap::from([(AGENT_NAME.to_owned(), Default::default())]),
agents: HashMap::from([(AGENT_NAME.to_owned(), AgentAttributes::default().into())]),
});

let expected_response = ank_base::Response {
request_id,
response_content: Some(ResponseContent::CompleteState(complete_state)),
Expand Down
8 changes: 4 additions & 4 deletions ank/doc/swdesign/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -903,9 +903,9 @@ Status: approved

When the Ankaios CLI presents connected Ankaios agents to the user, the Ankaios CLI shall present the agents as rows in a table with the following content:

| NAME | WORKLOADS |
| ------------------------ | -------------------------------- |
| `<agent_name>` as text | `<assigned_workloads>` as number |
| NAME | WORKLOADS | CPU USAGE | FREE MEMORY |
| ------------------------ | -------------------------------- | ------------------------ | -------------------------------- |
| `<agent_name>` as text | `<assigned_workloads>` as number | `<cpu_usage>` as percent | `<assigned_workloads>` as number |
HorjuRares marked this conversation as resolved.
Show resolved Hide resolved

Tags:
- CliCommands
Expand All @@ -921,7 +921,7 @@ Status: approved

When the user invokes the CLI with a request to provide the list of connected Ankaios agents, the Ankaios CLI shall:
* request the whole CompleteState of Ankaios server
* create a table row for each Ankaios agent listed inside the CompleteState's `agents` field with the agent name and the amount of workload states of its managed workloads
* create a table row for each Ankaios agent listed inside the CompleteState's `agents` field with the agent name and the amount of workload states of its managed workloads as well as the agent resource availability

Rationale:
Counting the workload states, rather than the assigned workloads in the desired state for each agent, ensures the correct number of workloads, even if a workload has been deleted from the desired state, but the actual deletion has not yet been scheduled.
Expand Down
4 changes: 4 additions & 0 deletions ank/src/cli_commands/agent_table_row.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,8 @@ pub struct AgentTableRow {
pub agent_name: String,
#[tabled(rename = "WORKLOADS")]
pub workloads: u32,
#[tabled(rename = "CPU USAGE")]
pub cpu_usage: u32,
#[tabled(rename = "FREE MEMORY")]
pub free_memory: f32,
}
Loading
Loading