-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onboarding of few task types to throttling #4542
Onboarding of few task types to throttling #4542
Conversation
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
@Override | ||
public String getClusterManagerThrottlingKey() { | ||
return "cluster-reroute-api"; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents two actions from accidentally using the same string?. I would prefer a centralised place where all keys can be registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a centralised place i.e. ClusterManagerThrottler. But, there is disconnect between the key stored in each executor vs ClusterManagerThrottler class. We can define constant in throttler class and use them in executor class. But the challenge would be to ensure key is unique per executor type and not executor object as lot of executors are defined inline and doesn't have a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all tasks have 1:1 mapping with corresponding TransportClusterManagerNodeActions
. May be we can leverage this and configure throttling keys against TransportClusterManagerNodeActions
which is responsible for the task. While initializing the task executor we can get the task type from this map. We can have validation against this map for the duplicate values.
This way we can also restrict the throttling to the actions which are following TransportClusterManagerNodeActions
route for submitting tasks to master and have the retry mechanism.
Thoughts?
I have made sample change for this for few of the tasks for ("put-mapping/create-index/delete-index/update-settings"). (commit 60ba56b). Will make necessary changes for other tasks based on the early feedback on approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhwanilpatel : The question I have is
- can we ensure there is a transport action associated with all tasks.
- what about the inline cluster state submit tasks, could there be multiple tasks such created from same transport class?
Also we can register only core tasks in this manner. Have we also thought about how to provide register option to plugins in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can we ensure there is a transport action associated with all tasks.
Not with all tasks, but if task wants to use throttling framework they will need to have associated transport cluster manager action as it performs the retry on throttling.
- what about the inline cluster state submit tasks, could there be multiple tasks such created from same transport class?
For those task for which I had configured in this PR, they all submit single tasks only. But you have a good point, as in future some transport action may require to submit multiple tasks.
I have given more thought and proposing some change in solution.
- We can introduce static method
registerTaskForThrottling(String throttlingKey, boolean retryable)
.- This will have logic for detecting duplicate throttling keys as well.
- Also through this, throttling framework will get information about if for given throttling key if data node is performing retries or not. Many of the internal task types doesn't extend TransportClusterManagerNodeAction, so they might not have default retry capability on throttling. Based on this flag throttler will not throw throttling exception for them.
- We will call this method from constructor of the service/class which is going to submit tasks, and have them register for all the tasks they are submitting along with confirmation on if retry mechanism is there or not. Something similar to the way we register for various transport actions.
Also we can register only core tasks in this manner. Have we also thought about how to provide register option to plugins in future?
As per my understanding, plugins can access internal class (MasterService/ClusterService/etc) of Opensearch. They can call this new static method and register for tasks.
I have less expertise in plugin area, please correct me if my above understanding is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shwetathareja I have made sample change for reference for above approach as well, please take a look at [6c67dc9]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhwanilpatel : thanks for raising a draft PR. The proposal of registering with ClusterManagerTaskThrottler
looks good. One thing I am wondering is that instead of making the register method as static and exposing the throttler class out of ClusterManager package. One option is to give method in ClusterService for registering as it is accessible from everywhere and then in turn call MasterService -> Throttler class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can do that, as you mentioned we will need to do it via ClusterService -> MasterService -> Throttler class.
Since all other services submits the master tasks via cluster services, they will have access of cluster service for registering key as well.
Will make change for it.
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
/* | ||
* Task will get retried from { @link TransportClusterManagerNodeAction} | ||
*/ | ||
ClusterManagerTaskThrottler.registerThrottlingKey(UPDATE_SETTING_CLUSTER_MANAGER_TASK_KEY, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the value bool, when should it be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If task is not extended by TransportClusterManagerNodeAction
and don't have explicit retries on throttling, then we can register that task key with false value. One example could be ShardStateAction
.
Idea is throttler will not allow to set the throttling threshold for such tasks, else data node can fail with throttling exception.
With this we can get aggregated count of those task from new map which we are maintaining in master side.
server/src/main/java/org/opensearch/cluster/service/ClusterManagerTaskThrottler.java
Outdated
Show resolved
Hide resolved
|
||
/* | ||
* Task will get retried from { @link TransportClusterManagerNodeAction} | ||
*/ | ||
ClusterManagerTaskThrottler.registerThrottlingKey(CREATE_INDEX_CLUSTER_MANAGER_TASK_KEY, true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we move the key's definition too in the
ClusterManagerTaskThrottler
class - Can we pass ClusterManagerTaskThrottler#registerThrottlingKey as a
Consumer
to this class, something like
private BiConsumer<String, Boolean> taskThrottlingRegistration
and then
taskThrottlingRegistration.accept(ClusterManagerTaskThrottler.CREATE_INDEX_CLUSTER_MANAGER_TASK_KEY, true);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are planing to expose the registering functionality via cluster service only, do you think we still need consumer? As it will be single and only way for registering throttling keys.
All the services has access to cluster service, so they can register it via cluster service. If we want to introduce the consumer, we will need to change constructor of almost all services to have this argument.
Thoughts?
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
...src/main/java/org/opensearch/action/admin/cluster/reroute/TransportClusterRerouteAction.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/cluster/service/ClusterManagerThrottlingKeys.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dhwanilpatel . just minor comments.
One point: we should catch any core task not registering itself via assertion that throttling key shouldnt be null during submit task or something.
server/src/main/java/org/opensearch/cluster/service/ClusterManagerTaskThrottler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/ClusterManagerTaskThrottler.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/snapshots/SnapshotsService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/snapshots/SnapshotsService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/ClusterService.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
server/src/main/java/org/opensearch/cluster/service/ClusterManagerTaskThrottler.java
Show resolved
Hide resolved
server/src/test/java/org/opensearch/cluster/metadata/MetadataIndexTemplateServiceTests.java
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
* @param taskThrottlingKey - throttling key for task | ||
* @param throttlingEnabled - if throttling is enabled or not i.e. data node is performing retry over throttling exception or not. | ||
*/ | ||
public ClusterManagerThrottlingKey(String taskThrottlingKey, boolean throttlingEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The very idea of this class was to make this an inner class Throttler with only privately accessible constructor so that the only way to get an instance was through registration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the constructor private, but we will need to expose the Default throttling key via static field from Throttler class as that will be needed for the tasks which are not on boarded on throttling framework. They will use this default key, and throttling will be bypassed for that default key.
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/cluster/service/ClusterManagerTaskThrottler.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dhwanilpatel left a minor comment otherwise looks good.
Gradle Check (Jenkins) Run Completed with:
|
…' into onboarding-pr Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
ae35a6c
to
58824e2
Compare
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## feature/master-task-throttling #4542 +/- ##
====================================================================
- Coverage 70.71% 70.59% -0.13%
+ Complexity 57644 57562 -82
====================================================================
Files 4665 4666 +1
Lines 276614 276702 +88
Branches 40297 40300 +3
====================================================================
- Hits 195621 195350 -271
- Misses 64737 65090 +353
- Partials 16256 16262 +6
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
* Onboarding of few task types to throttling Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… #3527 Changes required in Master node to perform throttling. Master node changes for master task throttling #3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling #4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling #4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling #4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
* Add basic thorttler/exponential backoff policy for retry/Defination of throttling exception (#3856) * Corrected Java doc for Throttler * Changed the default behaviour of Throttler to return Optional * Removed generics from Throttler and used String as key * Ignore backport / autocut / dependabot branches for gradle checks on push * Master node changes for master task throttling (#3882) * Data node changes for master task throttling (#4204) * Onboarding of few task types to throttling (#4542) * Fix timeout exception and Add Integ test for Master task throttling (#4588) * Complete TODO for version change and removed unused classes(Throttler and Semaphore) (#4846) * Remove V1 version from throttling testcase Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… #3527 Changes required in Master node to perform throttling. Master node changes for master task throttling #3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling #4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling #4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling #4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Signed-off-by: Dhwanil Patel dhwanip@amazon.com
Description
Added few of the Cx driven tasks to cluster manager task throttling framework. These all tasks onboarded on this PR, extends the
TransportClusterManagerNodeAction
, so they will by default have the retry mechanism on the throttling from cluster manager node.Issues Resolved
#479
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.