Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Change Proposal] Support packages with many fields #758

Open
jsoriano opened this issue Jun 5, 2024 · 0 comments
Open

[Change Proposal] Support packages with many fields #758

jsoriano opened this issue Jun 5, 2024 · 0 comments
Labels
discuss Issue needs discussion

Comments

@jsoriano
Copy link
Member

jsoriano commented Jun 5, 2024

There are packages that contain an increasing number of fields on each version. These packages will hit at some point the 2048 limit per data stream we have now. An example is the amazon_security_lake package, that includes many fields from OCSF.

This, and other limits, exist to have some control on the size of the packages on different dimensions. In the case of data stream fields, this limit exists to avoid performance issues or other problems with indexes that have too many field mappings. See for example the warning about this in the Elasticsearch documentation (Mapping Limits docs).

The total number of fields in a data stream (including dynamic mappings) can be configured in the data stream manifest (elasticsearch.index_template.settings.index.mappings.total_fields).

Some options to explore:

  • Allow to skip validation on number of fields. I would avoid that because is a risk of distributing problematic mappings.
  • Allow to override the limit on number of fields. If we do that, I think we should still have a hard limit that cannot be exceeded.
  • Refactor the affected packages to make more use of dynamic mappings. We can study the current case and provide a general recommendation to include in docs. We can also be more flexible with the limits for definitions of dynamic mappings.
  • Refactor the affected packages, splitting the data stream. Not desired as would be a breaking change in most cases.

cc @mrodm @kpollich @ShourieG for thoughts about possible approaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs discussion
Projects
None yet
Development

No branches or pull requests

1 participant