Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Dynamic data stream namespaces #134971

Closed
Tracked by #151898
axw opened this issue Jun 23, 2022 · 17 comments · Fixed by #154732
Closed
Tracked by #151898

[Fleet] Dynamic data stream namespaces #134971

axw opened this issue Jun 23, 2022 · 17 comments · Fixed by #154732
Assignees
Labels
enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@axw
Copy link
Member

axw commented Jun 23, 2022

Describe the feature:

Integrations should be able to produce dynamic data_stream.namespace values, rather than having this statically defined per policy.

Describe a specific use case for the feature:

Users sometimes would like to create separate data streams for their data -- e.g. split by APM service, service group, or service environment (dev/test/prod).

Splitting data streams like this enables users to apply different ILM or security policies depending on the application or group of applications. For example, one might wish to keep production logs for years for auditing or regulatory compliance; but logs for dev/test environments may be deleted in the order of days or weeks.

In APM, we would introduce configuration that would allow users to template the namespace. This would be a more restrictive form of the old output.elasticsearch.indices configuration in libbeat, where one could for example include %{[service.name]} to route events to service-specific indices.

@axw axw added enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team labels Jun 23, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@felixbarny
Copy link
Member

Sounds like that would be like routing within APM Server. Have you considered whether document based ingest routing would be suitable. While it's not available just yet, it might be a good fit.

@axw
Copy link
Member Author

axw commented Jun 23, 2022

@felixbarny the routing part doesn't need to be implemented in APM Server. I was thinking that we would have high level configuration, but that could translate either into some low level routing configuration in APM Server, or it could translate into ingest pipelines. We could also just send users to the ingest pipeline editor directly.

Either way, we need some changes in Fleet such that the namespace is not statically defined at the policy level.

@felixbarny
Copy link
Member

Makes sense.

This seems related to the allow_routing flag that was proposed in the package spec: elastic/package-spec#327. With the difference that it would only require a wildcard in the namespace part.

@joshdover
Copy link
Contributor

One concern about using the package-level allow_routing flag for this feature is that it would make upgrading to the APM package quite a disruptive change by widening the API privileges significantly in ways that may not be necessary or obvious to the user.

It's also not clear to me what should happen with the namespace parameter in the package policy.

Instead of a package level setting, I think we should allow for variables in the namespace field on the package policy. Fleet would then know how to interpret this as needing a wildcard in the API key privileges.

The APM package could then make the default for this namespace field be something generally useful, like %{service.name}, so that new policies get this new behavior. Existing policies would already have a namespace set (default or something custom) and would not be affected unless the user explicitly opts-in to this behavior by using a variable in this package policy field.

This route seems to avoid some of the UX issues around upgrades and makes the decision more explicit for the customer.

@axw
Copy link
Member Author

axw commented Sep 8, 2022

@joshdover that would solve our immediate needs for routing data to different APM data streams.

Between when I opened this issue and now, there have been some conversations going on about potentially using APM Server's OTLP intake for receiving infrastructure metrics and logs. For that use case, we would ideally want to use document-based ingest routing mentioned by Felix above, to send logs/metrics to the most appropriate data stream.

For example, you might point the OpenTelemetry Collector's Host Metrics Receiver at APM Server. APM Server would translate the metrics to a common format, and send documents to metrics-generic-default. There would be a separate integration that knows how to deal with OTel host receiver metrics -- it would register a routing rule for itself, and define mappings and ingest pipelines.

We don't necessarily have to solve that right now, but we should keep it in mind as we choose between the options.

CC @ruflin @Mpdreamz

@Mpdreamz
Copy link
Member

Mpdreamz commented Sep 8, 2022

We don't necessarily have to solve that right now, but we should keep it in mind as we choose between the options.

We might just need to tackle both explicitly as two distinct usecases.

The APM package could then make the default for this namespace field be something generally useful, like %{service.name}, so that new policies get this new behavior. Existing policies would already have a namespace set (default or something custom) and would not be affected unless the user explicitly opts-in to this behavior by using a variable in this package policy field.

For APM this would unblock https://github.com/elastic/apm-dev/issues/801
We'd be able to upgrade our policy to write to %{service.namespace:default} for the following datastreams

traces-apm-
traces-apm.rum-
metrics-apm.service-
metrics-apm.internal-
metrics-apm.profiling-
metrics-apm.app.<service.name>-
logs-apm.error-

It will also unblock a ton of follow up work I won't list here 🙏 any work we can do to get this in 8.5/8.6 would be a tremendous enabler for us.

Existing policies would already have a namespace set (default or something custom) and would not be affected unless the user explicitly opts-in to this behavior by using a variable in this package policy field.

Is altering defaults for existing policies a no go? e.g We want to advertise that users can simply set ELASTIC_AGENT_NAMESPACE in 8.N to control where data ends up (with server side allow listing controlling data-stream explosion). If we can't upgrade users automatically we'd have to document this manual upgrade step for existing users.

One concern about using the package-level allow_routing flag for this feature is that it would make upgrading to the APM package quite a disruptive change by widening the API privileges significantly in ways that may not be necessary or obvious to the user.

The allow_routing use case is definitely not going away as @axw mentioned we are increasingly becoming a generic entry point for metrics and logs. I have no worries about widening APM-Server's API permissions on upgrade automatically but agree if we solve this through a generic new flag this needs special care.

In many ways the APM integration is already special if this new feature is scoped only to APM and maybe the Logs integration) would some of those privilege widening worries go away?

Ideally we'd be able to upgrade existing users to smoother log onboarding, stack monitoring etcetera without putting up barriers to upgrade.

@ruflin
Copy link
Member

ruflin commented Sep 9, 2022

In many ways the APM integration is already special if this new feature is scoped only to APM and maybe the Logs integration) would some of those privilege widening worries go away?

Having logic specific to 1-2 packages would be something that concerns me. But I doubt we need it, as similar problems apply to all input packages.

@felixbarny
Copy link
Member

Seems like all the blocker have been resolved. I guess the next step is to implement this in Fleet? Do we need a separate issue or is this good enough to track the execution? How can we make sure the implementation gets prioritized?

@kpollich
Copy link
Member

I can add this issue to an upcoming sprint for the Fleet team. We'll need to do a little technical definition on our end to flesh this out, but I can take that on in the coming days. Once we have the implementation details sorted out we'll work on prioritizing this. Thanks all.

@felixbarny
Copy link
Member

@kpollich @joshdover could one of you provide an update on where we stand on this?

@axw
Copy link
Member Author

axw commented Jan 9, 2023

Due to a change in priorities, APM is unlikely to be making use of this in the near future. We still intend to provide the functionality eventually.

I think this is still generally relevant (APM is just one use case), but feel free to close if you prefer.

@felixbarny
Copy link
Member

I agree with what Andrew said. To add some more context, the need for routing traces to different namespaces didn't go away for APM but we're considering to solve it outside of Fleet in the new architecture.

However, is still very much a blocker and a priority for routing logs and metrics in integrations. There are more downstream blocker on Elasticsearch (elastic/elasticsearch#63798) but this would unblock routing from the Fleet side.

Once this is resolved, we can also resume the work on the ECS logs integration: elastic/integrations#2972

@kpollich
Copy link
Member

kpollich commented Jan 9, 2023

Since this has been identified as not being a priority at this time, I'm going to defer this from our current sprint. Thanks Andrew and Felix for providing some additional context and references here. We'll tackle some technical definition here in an upcoming sprint.

@felixbarny
Copy link
Member

@joshdover @kpollich we're targeting an experimental release of the reroute processor in 8.8 and want to make use of it in integrations in 8.9. While there's more to fully support routing in Fleet, just having this issue resolved would get us a long way to make sure integrations can make use of the reroute processor within their own pipeline.

Could we please prioritize this issue soon?

@joshdover
Copy link
Contributor

@felixbarny we'll get it in for 8.8

@joshdover
Copy link
Contributor

PR is ready for review: #154732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants