[Discuss] Elastic Packages: Introduce schema for data providers #199

mtojek · 2021-07-09T07:36:39Z

In general I like the deterministic approach we are following here. There is a concern I have about the number of fields this will add to each dataset and with it increase size of all templates and mapping. This multiplies quickly with many dataset and if we just add k8s fields to all integrations even if someone does not use it, it is not great.

There might be a partial way out here: dynamic mappings. Instead of configuring all fields for k8s in the referenced fields it is a dynamic mapping that makes sure the fields are dynamically mapped correctly, most are keywords anyways so likely do not even need the mapping to be set as this is the default.

One completely different alternative is to use more recent feature in Elasticsearch that the mapping can be sent as part of the request. Like this the creation of these mappings would be delegated to Beats as part of the pull request. But it would have to be investigated if this causes issues with the permissions.

PS: I don't like that we keep mixing two discussions into a single issue. It keeps creating confusion. We should close this issue and have a separate one for the "current" discussion.

Originally posted by @ruflin in #63 (comment)

mtojek · 2021-07-09T07:46:56Z

@ruflin I think the same idea about dynamic templates was brought by @jsoriano in #63 (comment) .

I'm afraid we don't have power to decide if we can go that way. Probably it isn't easy to introduce in a single iteration, maybe you can suggest somebody from the agent team who can explore the idea (if it's doable)?

In general I like the deterministic approach we are following here. There is a concern I have about the number of fields this will add to each dataset and with it increase size of all templates and mapping. This multiplies quickly with many dataset and if we just add k8s fields to all integrations even if someone does not use it, it is not great.

Do you think we should pair with the Fleet team to consider a more flexible solution (user can select enabled data providers, e.g. kubernetes runtime)?

Side note:

It seems to be a neverending thread and there are many objectives. I would be great to decide if we want to solve this problem now or should we focus on something else.

ruflin · 2021-07-14T07:21:46Z

The part I would like to understand is: What breaks if we don't handle it right now? The reason I ask this is because I think the default mappings will cover most providers pretty well.

I'm good with getting a short term solution but this short term solution must ensure we don't get ourself into a position that we now map just everything and are back to the problem of too many fields.

mtojek · 2021-07-14T08:04:18Z

Frankly speaking I like the approach with default mappings more if it's on the roadmap. This way we won't pollute indices with useless mappings (e.g. cloud fields in non-cloud environments).

We have to remember that it impacts developer experience, developer/devops:

won't be able to learn about all available fields
won't be able to depend on package tests in terms of validation
developers will add some fields randomly to package fields ("oh there are some cloud fields emitted in the CI, I should probably add them too")

If there are technical difficulties or gaps on the agent/fleet side, I'm good with postponing this later (rather not for never), but we need to take this decision cautiously.

cc @masci @andresrc @jsoriano

exekias · 2021-07-14T08:24:23Z

One thing that is not covered by default mappings is defining the meta for some of these fields. I'm specifically thinking about dimensions, that we need to flag in some cases. We could put these fields in the ES default mappings but I'm wondering if that's a good practice, as it ties the fields to the ES version.

jsoriano · 2021-07-14T11:35:41Z

default mappings

When talking about default mappings to what mappings do you refer to?

What breaks if we don't handle it right now?

Regarding this question, the main part we are lacking now is the mapping for fields added by processors or autodiscover. In principle, I think that currently this would only affect standalone agent, as is the only way to include dynamic inputs or additional processors at the moment. With the current focus on the Fleet experience this may be less prioritary, but still something we need to improve for important use cases.

Also some mappings don't break now only because in some modules we are already including the mapping of many "data provider" fields, or other common fields. But this situation is not ideal (it requires bulk changes when something changes in these fields, and many fields are included in mappings even if they are never used in most deployments).

mtojek · 2021-07-19T07:55:38Z

When talking about default mappings to what mappings do you refer to?

I was thinking about dynamic mappings using the ES feature you mentioned. By "default" I meant hardcoded (mapping with default type) somewhere in Kibana, Agent, etc., but not in integration.

mtojek self-assigned this Jul 9, 2021

mtojek mentioned this issue Jul 12, 2021

[Discuss] Avoiding duplication of ECS field definitions #63

Closed

This was referenced Jul 19, 2021

Add validation for supported inputs in a package #32

Open

Validate fields skipped in type assertion elastic/elastic-package#147

Open

ChrsMark mentioned this issue Sep 6, 2021

[Agent] Support labels dedot in k8s provider elastic/beats#27019

Closed

jsoriano added the 8.6-candidate label Sep 12, 2022

This was referenced Sep 12, 2022

Checklist for version 2.0.0 #399

Closed

[elastic_agent] agent.* fields are not mapped elastic/integrations#4191

Closed

jsoriano mentioned this issue Oct 18, 2022

Out of the box ECS field mappings for Custom Input packages elastic/integrations#4236

Open

10 tasks

jsoriano mentioned this issue Oct 26, 2022

[Change Proposal] Introduce an "Agent Common Schema" #441

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discuss] Elastic Packages: Introduce schema for data providers #199

[Discuss] Elastic Packages: Introduce schema for data providers #199

mtojek commented Jul 9, 2021

mtojek commented Jul 9, 2021

ruflin commented Jul 14, 2021

mtojek commented Jul 14, 2021

exekias commented Jul 14, 2021

jsoriano commented Jul 14, 2021

mtojek commented Jul 19, 2021

[Discuss] Elastic Packages: Introduce schema for data providers #199

[Discuss] Elastic Packages: Introduce schema for data providers #199

Comments

mtojek commented Jul 9, 2021

mtojek commented Jul 9, 2021

ruflin commented Jul 14, 2021

mtojek commented Jul 14, 2021

exekias commented Jul 14, 2021

jsoriano commented Jul 14, 2021

mtojek commented Jul 19, 2021