You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the Beats framework uses a state store that is based on the filesystem (libbeat/statestore). There are additional implementations, such as entityanalytics/kvstore and cursor.StateStore. This state store is used by Filebeat to ensure data is not ingested twice, which is critical for accurate data ingestion and processing.
Until now, the Elastic Agent has relied on persistent storage in two main environments:
Running on an endpoint: where storage is available on the device.
Kubernetes Deployments: where a DaemonSet mounts the node’s volume as persistent storage.
In Kubernetes manifests, this is configured using a host path, as shown:
Agentless data ingestion allows users to collect data from cloud services, SaaS applications, and public APIs without needing to install or maintain agents. This approach reduces the complexity and overhead involved in managing agents, including version updates and continuous monitoring, and also eliminates the need for additional payments for agent-based operations.
By removing the need for Elastic Agent, users benefit from easier data ingestion while reducing the operational burden.
The challange in agentless
For agentless deployments, particularly on serverless platforms and ESS, running Elastic Agent on Kubernetes is necessary. However, using a DaemonSet or StatefulSet is not feasible in this environment. Instead, Elastic Agent is run as a Kubernetes Deployment.
Initially, we considered mounting a persistent volume (NFS) to the Elastic Agent deployment. However, this approach has limitations, especially regarding the number of volumes that can be attached to a single node (39 volumes on EKS). The approach focusing on security and workload isolation,requires that each agent policy runs a one integration, increasing the need for a non-filesystem-based persistent layer.
Use case
Many of the integrations maintained by the Security Integration team depend on state management for optimal performance. State is essential to avoid the re-ingestion of already processed data, which would negatively impact customer billing by processing duplicates.
For example, an integration fetching data from a cloud API needs to store a cursor or checkpoint to know which data has already been ingested. Without this state, the integration risks retrieving and processing the same data repeatedly. This sheet outlines candidate integrations for running agentlessly, most if then requires state to function efficiently.
Proposal
We propose implementing a state store backed by Elasticsearch. Having additonal (and unified statestore) has been discussed in #40748. In addition, Elasticsearch-Connector already uses the upstream ES to store configuration and state. By implementing Elasticsearch for the backend/statestore interface, we can unblock the release of more integrations and enhance the agentless experience.
Action Item
The Cloud Security team will run POC to understand the feasibility and complexity of delivering this by the 8.17 release. The POC will focus on HTTP JSON-based integration where the state object is mostly a timestamp.
Concern that was raised
Generic AWS-S3 filebeat input integration stores a reference to all the objects in a bucket. This can grow fast and requires a high rate of read/writes in the worst case.
Okta entity analytics integration uses a custom implementation of local bolt db as a state store where transactions are made against that db. Changes here might be more complex.
2. Okta entity analytics integration uses a custom implementation of local bolt db as a state store where transactions are made against that db. Changes here might be more complex.
Effectively the state in this case is a snapshot of all the data fetched and some state values, has to be fetched and updated "atomically".
The similar approach with the state is used, as far as I see in the filebeat, for other "entity analytics" inputs: active directory, azuread, jamf, in addition to okta.
Background
Currently, the Beats framework uses a state store that is based on the filesystem (libbeat/statestore). There are additional implementations, such as entityanalytics/kvstore and cursor.StateStore. This state store is used by Filebeat to ensure data is not ingested twice, which is critical for accurate data ingestion and processing.
Until now, the Elastic Agent has relied on persistent storage in two main environments:
In Kubernetes manifests, this is configured using a host path, as shown:
What is Agentless Data Ingestion?
Agentless data ingestion allows users to collect data from cloud services, SaaS applications, and public APIs without needing to install or maintain agents. This approach reduces the complexity and overhead involved in managing agents, including version updates and continuous monitoring, and also eliminates the need for additional payments for agent-based operations.
By removing the need for Elastic Agent, users benefit from easier data ingestion while reducing the operational burden.
The challange in agentless
For agentless deployments, particularly on serverless platforms and ESS, running Elastic Agent on Kubernetes is necessary. However, using a DaemonSet or StatefulSet is not feasible in this environment. Instead, Elastic Agent is run as a Kubernetes Deployment.
Initially, we considered mounting a persistent volume (NFS) to the Elastic Agent deployment. However, this approach has limitations, especially regarding the number of volumes that can be attached to a single node (39 volumes on EKS). The approach focusing on security and workload isolation,requires that each agent policy runs a one integration, increasing the need for a non-filesystem-based persistent layer.
Use case
Many of the integrations maintained by the Security Integration team depend on state management for optimal performance. State is essential to avoid the re-ingestion of already processed data, which would negatively impact customer billing by processing duplicates.
For example, an integration fetching data from a cloud API needs to store a cursor or checkpoint to know which data has already been ingested. Without this state, the integration risks retrieving and processing the same data repeatedly.
This sheet outlines candidate integrations for running agentlessly, most if then requires state to function efficiently.
Proposal
We propose implementing a state store backed by Elasticsearch. Having additonal (and unified statestore) has been discussed in #40748. In addition, Elasticsearch-Connector already uses the upstream ES to store configuration and state. By implementing Elasticsearch for the
backend/statestore
interface, we can unblock the release of more integrations and enhance the agentless experience.References
Inform
The text was updated successfully, but these errors were encountered: