Proposal: Elasticsearch as State Store for Beats in Agentless Deployments #40985

olegsu · 2024-09-25T10:18:21Z

Background

Currently, the Beats framework uses a state store that is based on the filesystem (libbeat/statestore). There are additional implementations, such as entityanalytics/kvstore and cursor.StateStore. This state store is used by Filebeat to ensure data is not ingested twice, which is critical for accurate data ingestion and processing.

Until now, the Elastic Agent has relied on persistent storage in two main environments:

Running on an endpoint: where storage is available on the device.
Kubernetes Deployments: where a DaemonSet mounts the node’s volume as persistent storage.
In Kubernetes manifests, this is configured using a host path, as shown:

- name: elastic-agent-state
  hostPath:
    path: /var/lib/elastic-agent-managed/kube-system/state

What is Agentless Data Ingestion?

Agentless data ingestion allows users to collect data from cloud services, SaaS applications, and public APIs without needing to install or maintain agents. This approach reduces the complexity and overhead involved in managing agents, including version updates and continuous monitoring, and also eliminates the need for additional payments for agent-based operations.

By removing the need for Elastic Agent, users benefit from easier data ingestion while reducing the operational burden.

The challange in agentless

For agentless deployments, particularly on serverless platforms and ESS, running Elastic Agent on Kubernetes is necessary. However, using a DaemonSet or StatefulSet is not feasible in this environment. Instead, Elastic Agent is run as a Kubernetes Deployment.

Initially, we considered mounting a persistent volume (NFS) to the Elastic Agent deployment. However, this approach has limitations, especially regarding the number of volumes that can be attached to a single node (39 volumes on EKS). The approach focusing on security and workload isolation,requires that each agent policy runs a one integration, increasing the need for a non-filesystem-based persistent layer.

Use case

Many of the integrations maintained by the Security Integration team depend on state management for optimal performance. State is essential to avoid the re-ingestion of already processed data, which would negatively impact customer billing by processing duplicates.

For example, an integration fetching data from a cloud API needs to store a cursor or checkpoint to know which data has already been ingested. Without this state, the integration risks retrieving and processing the same data repeatedly.
This sheet outlines candidate integrations for running agentlessly, most if then requires state to function efficiently.

Proposal

We propose implementing a state store backed by Elasticsearch. Having additonal (and unified statestore) has been discussed in #40748. In addition, Elasticsearch-Connector already uses the upstream ES to store configuration and state. By implementing Elasticsearch for the backend/statestore interface, we can unblock the release of more integrations and enhance the agentless experience.

References

Slack thread with the product requirements
[Filebeat] - Moving to a unified open source key-value store for state management #40748
Implement the key-value state store in V2 control protocol elastic-agent#2178
https://github.com/elastic/elasticsearch/pull/112556/files

Inform

Cloud Security @tehilashn @smriti0321 @oren-zohar @eyalkraft
Security Integration @norrietaylor @andrewkroh @aleksmaus
Beats @cmacknz

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-09-25T11:30:53Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

olegsu · 2024-09-29T17:45:49Z

Update from Sep 26, discussion about this proposal
Participants @cmacknz, @andrewkroh, @aleksmaus and @olegsu

Action Item
The Cloud Security team will run POC to understand the feasibility and complexity of delivering this by the 8.17 release. The POC will focus on HTTP JSON-based integration where the state object is mostly a timestamp.

Concern that was raised

Generic AWS-S3 filebeat input integration stores a reference to all the objects in a bucket. This can grow fast and requires a high rate of read/writes in the worst case.
Okta entity analytics integration uses a custom implementation of local bolt db as a state store where transactions are made against that db. Changes here might be more complex.

aleksmaus · 2024-09-30T14:04:13Z

2. Okta entity analytics integration uses a custom implementation of local bolt db as a state store where transactions are made against that db. Changes here might be more complex.

Effectively the state in this case is a snapshot of all the data fetched and some state values, has to be fetched and updated "atomically".

The similar approach with the state is used, as far as I see in the filebeat, for other "entity analytics" inputs: active directory, azuread, jamf, in addition to okta.

olegsu · 2024-11-03T09:59:38Z

The POC is in review https://github.com/elastic/security-team/issues/10714

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 25, 2024

olegsu added the Team:Elastic-Agent Label for the Agent team label Sep 25, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 25, 2024

olegsu self-assigned this Sep 25, 2024

olegsu closed this as completed Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Elasticsearch as State Store for Beats in Agentless Deployments #40985

Proposal: Elasticsearch as State Store for Beats in Agentless Deployments #40985

olegsu commented Sep 25, 2024 •

edited

Loading

elasticmachine commented Sep 25, 2024

olegsu commented Sep 29, 2024

aleksmaus commented Sep 30, 2024

olegsu commented Nov 3, 2024

Proposal: Elasticsearch as State Store for Beats in Agentless Deployments #40985

Proposal: Elasticsearch as State Store for Beats in Agentless Deployments #40985

Comments

olegsu commented Sep 25, 2024 • edited Loading

Background

What is Agentless Data Ingestion?

The challange in agentless

Use case

Proposal

elasticmachine commented Sep 25, 2024

olegsu commented Sep 29, 2024

aleksmaus commented Sep 30, 2024

olegsu commented Nov 3, 2024

olegsu commented Sep 25, 2024 •

edited

Loading