Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.elasticsearch.cluster.metadata.MetadataIndexAliasesService#applyAliasActions can become very slow when adding aliases to large data streams #92609

Open
original-brownbear opened this issue Dec 30, 2022 · 5 comments
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement Team:Data Management Meta label for data/management team

Comments

@original-brownbear
Copy link
Member

When adding aliases that include filters to data streams we validate the filter for every index in the data stream. This entails instantiating a temporary index service for every index (at least for every index not on the master node) which in turn means parsing each index's mapping and setting up a mapper instance.
This can take many seconds for larger data streams. It seems we could avoid validating the filter for every index here and just validate for every unique mapping only?

relates #89924 and #77466

@original-brownbear original-brownbear added >bug :Data Management/Data streams Data streams and their lifecycles labels Dec 30, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Dec 30, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@DaveCTurner
Copy link
Contributor

I'm guessing that making a whole new Metadata each time round the loop, rebuilding the index abstraction lookup map, is also not very cheap?

@original-brownbear
Copy link
Member Author

I'm guessing that making a whole new Metadata each time round the loop, rebuilding the index abstraction lookup map, is also not very cheap?

You'd think so but parsing mappings for something like Beats mappings is so absurdly expensive that rebuilding a 5k indices Metadata is probably still less expensive than parsing the mappings once. It makes sense if you think about it, parsing a mapping with 4k fields probably means something like 10k map puts etc. :)

@DaveCTurner
Copy link
Contributor

Sure that makes sense for the case we looked at, but the metadata-building cost scales with the total number of indices/aliases in the cluster, not just the target data stream, and in another situation we could have 10x or more total indices. We do see some folk using aliases very heavily.

@original-brownbear
Copy link
Member Author

Right, that makes sense. If you have simple mappings you might bottle neck more on the metadata rebuilding. Though that has been way optimized lately while mapping parsing hasn't and I think rebuilding a 50k indices/DS/aliases metadata was about as expensive as parsing one Beats mapping.

@mattc58 mattc58 added >enhancement and removed >bug labels Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

4 participants