Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Elasticsearch as State Store for Beats in Agentless Deployments #40985

Closed
olegsu opened this issue Sep 25, 2024 · 4 comments
Closed
Assignees
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@olegsu
Copy link

olegsu commented Sep 25, 2024

Background

Currently, the Beats framework uses a state store that is based on the filesystem (libbeat/statestore). There are additional implementations, such as entityanalytics/kvstore and cursor.StateStore. This state store is used by Filebeat to ensure data is not ingested twice, which is critical for accurate data ingestion and processing.

Until now, the Elastic Agent has relied on persistent storage in two main environments:

  • Running on an endpoint: where storage is available on the device.
  • Kubernetes Deployments: where a DaemonSet mounts the node’s volume as persistent storage.
    In Kubernetes manifests, this is configured using a host path, as shown:
- name: elastic-agent-state
  hostPath:
    path: /var/lib/elastic-agent-managed/kube-system/state

What is Agentless Data Ingestion?

Agentless data ingestion allows users to collect data from cloud services, SaaS applications, and public APIs without needing to install or maintain agents. This approach reduces the complexity and overhead involved in managing agents, including version updates and continuous monitoring, and also eliminates the need for additional payments for agent-based operations.

By removing the need for Elastic Agent, users benefit from easier data ingestion while reducing the operational burden.

The challange in agentless

For agentless deployments, particularly on serverless platforms and ESS, running Elastic Agent on Kubernetes is necessary. However, using a DaemonSet or StatefulSet is not feasible in this environment. Instead, Elastic Agent is run as a Kubernetes Deployment.

Initially, we considered mounting a persistent volume (NFS) to the Elastic Agent deployment. However, this approach has limitations, especially regarding the number of volumes that can be attached to a single node (39 volumes on EKS). The approach focusing on security and workload isolation,requires that each agent policy runs a one integration, increasing the need for a non-filesystem-based persistent layer.

Use case

Many of the integrations maintained by the Security Integration team depend on state management for optimal performance. State is essential to avoid the re-ingestion of already processed data, which would negatively impact customer billing by processing duplicates.

For example, an integration fetching data from a cloud API needs to store a cursor or checkpoint to know which data has already been ingested. Without this state, the integration risks retrieving and processing the same data repeatedly.
This sheet outlines candidate integrations for running agentlessly, most if then requires state to function efficiently.

Proposal

We propose implementing a state store backed by Elasticsearch. Having additonal (and unified statestore) has been discussed in #40748. In addition, Elasticsearch-Connector already uses the upstream ES to store configuration and state. By implementing Elasticsearch for the backend/statestore interface, we can unblock the release of more integrations and enhance the agentless experience.

References

Inform

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 25, 2024
@olegsu olegsu added the Team:Elastic-Agent Label for the Agent team label Sep 25, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 25, 2024
@olegsu olegsu self-assigned this Sep 25, 2024
@olegsu
Copy link
Author

olegsu commented Sep 29, 2024

Update from Sep 26, discussion about this proposal
Participants @cmacknz, @andrewkroh, @aleksmaus and @olegsu

Action Item
The Cloud Security team will run POC to understand the feasibility and complexity of delivering this by the 8.17 release. The POC will focus on HTTP JSON-based integration where the state object is mostly a timestamp.

Concern that was raised

  1. Generic AWS-S3 filebeat input integration stores a reference to all the objects in a bucket. This can grow fast and requires a high rate of read/writes in the worst case.
  2. Okta entity analytics integration uses a custom implementation of local bolt db as a state store where transactions are made against that db. Changes here might be more complex.

@aleksmaus
Copy link
Member

2. Okta entity analytics integration uses a custom implementation of local bolt db as a state store where transactions are made against that db. Changes here might be more complex.

Effectively the state in this case is a snapshot of all the data fetched and some state values, has to be fetched and updated "atomically".

The similar approach with the state is used, as far as I see in the filebeat, for other "entity analytics" inputs: active directory, azuread, jamf, in addition to okta.

@olegsu
Copy link
Author

olegsu commented Nov 3, 2024

The POC is in review https://github.com/elastic/security-team/issues/10714

@olegsu olegsu closed this as completed Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

3 participants