-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Memory Tracking - track memory used during a shard search #1009
Comments
How about having a way to stop/deprioritise memory heavy queries kind of like the way timeout for a query works? This is different than the observability issue. But makes sense to prevent these really intensive queries to begin with. |
Nice proposal, maybe we need an extension for aggregation reduce phases on the coordinator as well(major contributors to memory), also being cautious about deserialisation overhead. @AmiStrn maybe we need a special handling for query prioritization for instance async searches should have a different priority than usual search #1017. Also we might need to track/estimate memory prior to the memory allocation in order for it to be terminated early. I guess both of the above can be tracked separately. Thoughts? |
@AmiStrn Today a query execution can be stopped on scenarios like hitting the bucket limit or parent breakers. There is value to adding some notion of memory sandbox and preempt the query on hitting a 'per query memory limit' as the next phase and eventually improve the memory estimation (prior to executing) @Bukhtawar good point. This approach will not capture the reduce phase overhead and I will explore that as a follow up |
Finally got some time to explore this more and here are some thoughts
|
Is your feature request related to a problem? Please describe.
There is limited visibility into how much memory is consumed by a query. In an ideal world, resource consumption details should be abstracted out from users and everything should auto-tune/auto-reject. But we are not there (yet!) and with every query treated equally, certain memory heavy queries can end up tipping the memory breakers for all requests. It will be helpful to track and surface memory consumed by a query. This visibility can help users tune their query better
Describe the solution you'd like
The plan is to make this generic and expose these stats via tasks framework. Tasks Framework already tracks latency and has some context about the query/work being done. The idea is to enhance this to start tracking additional stats of memory and CPU consumed per task. As tasks have a nice parent--> child hierarchy, this mechanism will allow tracking the cluster-wide resource consumption from a query. So plan is to update
Task
to start tracking additional context + stats. When a task completes, this task info will be pushed to a sink. Sink can be logs or a system index to enable additional insightsFor search side tracking of stats - The proposed solution is to leverage the single threaded nature of searching within a shard. I plan to use ThreadMxBean.getCurrentThreadAllocatedBytes for tracking the memory consumption and exposing this in 2 forms
Based on some initial rally benchmarks on a POC, the overhead does not look high. Having said that, my plan is to gate this under a cluster setting
search.track_resources
that defaults tofalse
(disabled)Describe alternatives you've considered
search_stats
section into /_node/stats API that returns top N expensive queriesHowever this model is restricted to search model and will require additional work to track at a parent level, the cluster-wide impact of a query. Hence, this alternate while lesser work is not as powerful.
Planning
Task
to allow tracking additional context and stats. Expose interfaces to allow updating them (task today is immutable). On completion task resource usage stats are emitted to 'sink'The text was updated successfully, but these errors were encountered: