Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTiersUsageTransportAction is incredibly inefficient in large clusters #100230

Closed
Tracked by #77466
DaveCTurner opened this issue Oct 3, 2023 · 2 comments · Fixed by #101599
Closed
Tracked by #77466

DataTiersUsageTransportAction is incredibly inefficient in large clusters #100230

DaveCTurner opened this issue Oct 3, 2023 · 2 comments · Fixed by #101599
Assignees
Labels
>bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Oct 3, 2023

Today DataTiersUsageTransportAction executes an internal nodes stats action with all the trimmings:

client.admin()
.cluster()
.prepareNodesStats()
.all()
.setIndices(CommonStatsFlags.ALL)

In a large cluster this implementation may need hundreds of MiB of heap on the coordinating node to hold onto every statistic about every shard on every node (several kiB per shard) even though we use almost none of them. Worse, the coordinating node is always the elected master because that's how XPackUsageFeatureTransportAction derivatives work. It also burns a bunch of CPU and network bandwidth just transporting these stats around the cluster. AFAICT we could push this computation out to the individual nodes with a dedicated TransportNodesAction which computes the tiny TierSpecificStats on each node in a manner that allows the coordinating node to combine them.

It also does not propagate cancellation down to the nodes stats task (addressed in #100253)

It also captures the cluster state when it's initiated and retains it until completion, which can represent another 100MiB+ of heap usage.

Relates #77466.

@DaveCTurner DaveCTurner added >bug :Data Management/Data streams Data streams and their lifecycles labels Oct 3, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Oct 3, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 4, 2023
This action invokes a subsidiary action but does not set up the proper
parent/child relationship, so cancellations of the parent task do not
propagate to the child.

Relates elastic#100230
elasticsearchmachine pushed a commit that referenced this issue Oct 4, 2023
This action invokes a subsidiary action but does not set up the proper
parent/child relationship, so cancellations of the parent task do not
propagate to the child.

Relates #100230
@joegallo
Copy link
Contributor

Some not entirely dis-similar prior art (along the lines of a "dedicated TransportNodesAction which computes") in #100092, in case somebody is thinking of picking this up.

@gmarouli gmarouli self-assigned this Oct 17, 2023
gmarouli added a commit to gmarouli/elasticsearch that referenced this issue Nov 10, 2023
elasticsearchmachine pushed a commit that referenced this issue Nov 10, 2023
…ividual nodes (#100230) (#101599)" (#102042)

Reverting because the new action is not properly handled in a mixed
cluster.
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this issue Nov 13, 2023
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this issue Nov 13, 2023
…ividual nodes (elastic#100230) (elastic#101599)" (elastic#102042)

Reverting because the new action is not properly handled in a mixed
cluster.
gmarouli added a commit to gmarouli/elasticsearch that referenced this issue Nov 14, 2023
elena-shostak added a commit to elastic/kibana that referenced this issue Jun 19, 2024
…186370)

## Summary

Calls to `/_xpack/usage` in Elasticsearch do not perform well on large
clusters. See elastic/elasticsearch#100230.
Some users have reported timeouts on this request path.

Added a filter_path to the `/_xpack/usage` ES call to optimize the call.


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### For maintainers

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

__Fixes: https://github.com/elastic/kibana/issues/169449__

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
seanrathier pushed a commit to seanrathier/kibana that referenced this issue Jun 21, 2024
…lastic#186370)

## Summary

Calls to `/_xpack/usage` in Elasticsearch do not perform well on large
clusters. See elastic/elasticsearch#100230.
Some users have reported timeouts on this request path.

Added a filter_path to the `/_xpack/usage` ES call to optimize the call.


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### For maintainers

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

__Fixes: https://github.com/elastic/kibana/issues/169449__

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team
Projects
None yet
4 participants