Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Elastic Packages: Introduce schema for data providers #199

Open
mtojek opened this issue Jul 9, 2021 · 6 comments
Open

[Discuss] Elastic Packages: Introduce schema for data providers #199

mtojek opened this issue Jul 9, 2021 · 6 comments
Assignees

Comments

@mtojek
Copy link
Contributor

mtojek commented Jul 9, 2021

In general I like the deterministic approach we are following here. There is a concern I have about the number of fields this will add to each dataset and with it increase size of all templates and mapping. This multiplies quickly with many dataset and if we just add k8s fields to all integrations even if someone does not use it, it is not great.

There might be a partial way out here: dynamic mappings. Instead of configuring all fields for k8s in the referenced fields it is a dynamic mapping that makes sure the fields are dynamically mapped correctly, most are keywords anyways so likely do not even need the mapping to be set as this is the default.

One completely different alternative is to use more recent feature in Elasticsearch that the mapping can be sent as part of the request. Like this the creation of these mappings would be delegated to Beats as part of the pull request. But it would have to be investigated if this causes issues with the permissions.

PS: I don't like that we keep mixing two discussions into a single issue. It keeps creating confusion. We should close this issue and have a separate one for the "current" discussion.

Originally posted by @ruflin in #63 (comment)

@mtojek mtojek self-assigned this Jul 9, 2021
@mtojek
Copy link
Contributor Author

mtojek commented Jul 9, 2021

@ruflin I think the same idea about dynamic templates was brought by @jsoriano in #63 (comment) .

I'm afraid we don't have power to decide if we can go that way. Probably it isn't easy to introduce in a single iteration, maybe you can suggest somebody from the agent team who can explore the idea (if it's doable)?

In general I like the deterministic approach we are following here. There is a concern I have about the number of fields this will add to each dataset and with it increase size of all templates and mapping. This multiplies quickly with many dataset and if we just add k8s fields to all integrations even if someone does not use it, it is not great.

Do you think we should pair with the Fleet team to consider a more flexible solution (user can select enabled data providers, e.g. kubernetes runtime)?

Side note:

It seems to be a neverending thread and there are many objectives. I would be great to decide if we want to solve this problem now or should we focus on something else.

@ruflin
Copy link
Member

ruflin commented Jul 14, 2021

The part I would like to understand is: What breaks if we don't handle it right now? The reason I ask this is because I think the default mappings will cover most providers pretty well.

I'm good with getting a short term solution but this short term solution must ensure we don't get ourself into a position that we now map just everything and are back to the problem of too many fields.

@mtojek
Copy link
Contributor Author

mtojek commented Jul 14, 2021

Frankly speaking I like the approach with default mappings more if it's on the roadmap. This way we won't pollute indices with useless mappings (e.g. cloud fields in non-cloud environments).

We have to remember that it impacts developer experience, developer/devops:

  • won't be able to learn about all available fields
  • won't be able to depend on package tests in terms of validation
  • developers will add some fields randomly to package fields ("oh there are some cloud fields emitted in the CI, I should probably add them too")

If there are technical difficulties or gaps on the agent/fleet side, I'm good with postponing this later (rather not for never), but we need to take this decision cautiously.

cc @masci @andresrc @jsoriano

@exekias
Copy link
Contributor

exekias commented Jul 14, 2021

One thing that is not covered by default mappings is defining the meta for some of these fields. I'm specifically thinking about dimensions, that we need to flag in some cases. We could put these fields in the ES default mappings but I'm wondering if that's a good practice, as it ties the fields to the ES version.

@jsoriano
Copy link
Member

default mappings

When talking about default mappings to what mappings do you refer to?

What breaks if we don't handle it right now?

Regarding this question, the main part we are lacking now is the mapping for fields added by processors or autodiscover. In principle, I think that currently this would only affect standalone agent, as is the only way to include dynamic inputs or additional processors at the moment. With the current focus on the Fleet experience this may be less prioritary, but still something we need to improve for important use cases.

Also some mappings don't break now only because in some modules we are already including the mapping of many "data provider" fields, or other common fields. But this situation is not ideal (it requires bulk changes when something changes in these fields, and many fields are included in mappings even if they are never used in most deployments).

@mtojek
Copy link
Contributor Author

mtojek commented Jul 19, 2021

When talking about default mappings to what mappings do you refer to?

I was thinking about dynamic mappings using the ES feature you mentioned. By "default" I meant hardcoded (mapping with default type) somewhere in Kibana, Agent, etc., but not in integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants