[RFC] Admission Control mechanism for Cluster Manager APIs #7520

shwetathareja · 2023-05-11T09:35:01Z

Is your feature request related to a problem? Please describe.
Today, Cluster Manager can be overwhelmed by sending too many requests which can cause its memory/ CPU to spike and also making its transport busy. This can have unwanted effect on the cluster with critical operations like health checks failing, node-joins/ left processing getting delayed etc. There are circuit breakers which operate based on heap memory usage and would start rejection after a certain threshold is breached. But, it can allow lot of incoming requests as it takes into account incoming request size which would be 0 for most of the get requests. Also, APIs like _cluster/health, _cluster/state which are critical for cluster functioning are not tripped over but their response payload size could be really big potentially in MBs as well. The circuit breakers also don’t handle any prioritization.

OpenSearch already supports Indexing and Search Back Pressure with intelligent resource tracking. The proposal is to build smart admission control for Cluster Manager APIs (eventually back pressure).

Describe the solution you'd like
Cluster Manager availability is critical to overall availability and stability of the cluster. The proposal here is to provide more Intelligent request rejection mechanism which takes into account the pending requests in transport thread pool queue, consider other resources like cpu, handles prioritisation during rejection etc.

For write APIs, there is ClusterManager Task Throttling which should provide protection against too many tasks but few tasks spiking up the resource usage could cause impact. Though in the first phase, the plan is to focus on read APIs only.

In future, there should also be mechanism to cancel the read requests related to admin operation like _cat, _nodes/stats, _cluster APIs which are running for long duration.

I am looking for feedback from the community to evolve this feature from an idea to concrete proposal.

bbarani · 2024-02-06T19:18:36Z

@shwetathareja can you please confirm if this change can be included in 2.x without breaking existing API? Basically can this change be added in a backward compatible manner in 2.x line?

We are evaluating if this change requires 3.0 release or can be included in 2.x line so need your inputs.

shwetathareja · 2024-02-07T11:06:21Z

@bbarani this feature will be controlled using admission control settings and threshold and can be done in backward compatible manner in 2.x. We will not enable it by default to prevent any breaking change for users in 2.x and will do it once we have 3.0

shwetathareja added enhancement Enhancement or improvement to existing feature or request untriaged idea Things we're kicking around. labels May 11, 2023

andrross added the distributed framework label May 30, 2023

anasalkouz added discuss Issues intended to help drive brainstorming and decision making RFC Issues requesting major changes and removed untriaged labels May 31, 2023

shwetathareja changed the title ~~[RFC] Back Pressure mechanism for Cluster Manager APIs~~ [RFC] Admission Control / Back Pressure mechanism for Cluster Manager APIs Jun 14, 2023

shwetathareja changed the title ~~[RFC] Admission Control / Back Pressure mechanism for Cluster Manager APIs~~ [RFC] Admission Control mechanism for Cluster Manager APIs Jun 14, 2023

bharath-techie mentioned this issue Jul 27, 2023

[RFC] Admission Controller framework for OpenSearch #8910

Open

bbarani mentioned this issue Jan 30, 2024

[PROPOSAL] Finalize 2024 release schedule for OpenSearch opensearch-project/.github#186

Closed

rajiv-kv mentioned this issue Feb 29, 2024

Integrate with CPU admission controller for cluster-manager Read API's. #12496

Merged

8 tasks

shwetathareja closed this as completed in #12496 Mar 21, 2024

This was referenced Mar 21, 2024

Integrate with CPU admission controller for cluster-manager Read API'… #12829

Merged

Integrate with CPU admission controller for cluster-manager Read API'… #12832

Merged

rwali-aws added this to Cluster Manager Project Board Apr 22, 2024

github-project-automation bot moved this to 🆕 New in Cluster Manager Project Board Apr 22, 2024

rwali-aws moved this from 🆕 New to ✅ Done in Cluster Manager Project Board Apr 22, 2024

rwali-aws added the v2.13.0 Issues and PRs related to version 2.13.0 label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Admission Control mechanism for Cluster Manager APIs #7520

[RFC] Admission Control mechanism for Cluster Manager APIs #7520

shwetathareja commented May 11, 2023 •

edited

Loading

bbarani commented Feb 6, 2024

shwetathareja commented Feb 7, 2024

[RFC] Admission Control mechanism for Cluster Manager APIs #7520

[RFC] Admission Control mechanism for Cluster Manager APIs #7520

Comments

shwetathareja commented May 11, 2023 • edited Loading

bbarani commented Feb 6, 2024

shwetathareja commented Feb 7, 2024

shwetathareja commented May 11, 2023 •

edited

Loading