Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onboarding of few task types to throttling #4542

Conversation

dhwanilpatel
Copy link
Contributor

Signed-off-by: Dhwanil Patel dhwanip@amazon.com

Description

Added few of the Cx driven tasks to cluster manager task throttling framework. These all tasks onboarded on this PR, extends the TransportClusterManagerNodeAction, so they will by default have the retry mechanism on the throttling from cluster manager node.

Issues Resolved

#479

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@dhwanilpatel dhwanilpatel requested review from a team and reta as code owners September 19, 2022 04:21
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines 244 to 248
@Override
public String getClusterManagerThrottlingKey() {
return "cluster-reroute-api";
}

Copy link
Collaborator

@Bukhtawar Bukhtawar Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents two actions from accidentally using the same string?. I would prefer a centralised place where all keys can be registered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a centralised place i.e. ClusterManagerThrottler. But, there is disconnect between the key stored in each executor vs ClusterManagerThrottler class. We can define constant in throttler class and use them in executor class. But the challenge would be to ensure key is unique per executor type and not executor object as lot of executors are defined inline and doesn't have a class.

Copy link
Contributor Author

@dhwanilpatel dhwanilpatel Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all tasks have 1:1 mapping with corresponding TransportClusterManagerNodeActions. May be we can leverage this and configure throttling keys against TransportClusterManagerNodeActions which is responsible for the task. While initializing the task executor we can get the task type from this map. We can have validation against this map for the duplicate values.

This way we can also restrict the throttling to the actions which are following TransportClusterManagerNodeActions route for submitting tasks to master and have the retry mechanism.

Thoughts?

I have made sample change for this for few of the tasks for ("put-mapping/create-index/delete-index/update-settings"). (commit 60ba56b). Will make necessary changes for other tasks based on the early feedback on approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhwanilpatel : The question I have is

  1. can we ensure there is a transport action associated with all tasks.
  2. what about the inline cluster state submit tasks, could there be multiple tasks such created from same transport class?

Also we can register only core tasks in this manner. Have we also thought about how to provide register option to plugins in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can we ensure there is a transport action associated with all tasks.

Not with all tasks, but if task wants to use throttling framework they will need to have associated transport cluster manager action as it performs the retry on throttling.

  1. what about the inline cluster state submit tasks, could there be multiple tasks such created from same transport class?

For those task for which I had configured in this PR, they all submit single tasks only. But you have a good point, as in future some transport action may require to submit multiple tasks.

I have given more thought and proposing some change in solution.

  • We can introduce static method registerTaskForThrottling(String throttlingKey, boolean retryable).
    • This will have logic for detecting duplicate throttling keys as well.
    • Also through this, throttling framework will get information about if for given throttling key if data node is performing retries or not. Many of the internal task types doesn't extend TransportClusterManagerNodeAction, so they might not have default retry capability on throttling. Based on this flag throttler will not throw throttling exception for them.
  • We will call this method from constructor of the service/class which is going to submit tasks, and have them register for all the tasks they are submitting along with confirmation on if retry mechanism is there or not. Something similar to the way we register for various transport actions.

Also we can register only core tasks in this manner. Have we also thought about how to provide register option to plugins in future?

As per my understanding, plugins can access internal class (MasterService/ClusterService/etc) of Opensearch. They can call this new static method and register for tasks.
I have less expertise in plugin area, please correct me if my above understanding is not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwetathareja I have made sample change for reference for above approach as well, please take a look at [6c67dc9]

Copy link
Member

@shwetathareja shwetathareja Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhwanilpatel : thanks for raising a draft PR. The proposal of registering with ClusterManagerTaskThrottler looks good. One thing I am wondering is that instead of making the register method as static and exposing the throttler class out of ClusterManager package. One option is to give method in ClusterService for registering as it is accessible from everywhere and then in turn call MasterService -> Throttler class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can do that, as you mentioned we will need to do it via ClusterService -> MasterService -> Throttler class.

Since all other services submits the master tasks via cluster services, they will have access of cluster service for registering key as well.

Will make change for it.

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

/*
* Task will get retried from { @link TransportClusterManagerNodeAction}
*/
ClusterManagerTaskThrottler.registerThrottlingKey(UPDATE_SETTING_CLUSTER_MANAGER_TASK_KEY, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the value bool, when should it be false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If task is not extended by TransportClusterManagerNodeAction and don't have explicit retries on throttling, then we can register that task key with false value. One example could be ShardStateAction.
Idea is throttler will not allow to set the throttling threshold for such tasks, else data node can fail with throttling exception.

With this we can get aggregated count of those task from new map which we are maintaining in master side.

Comment on lines 183 to 188

/*
* Task will get retried from { @link TransportClusterManagerNodeAction}
*/
ClusterManagerTaskThrottler.registerThrottlingKey(CREATE_INDEX_CLUSTER_MANAGER_TASK_KEY, true);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we move the key's definition too in the ClusterManagerTaskThrottler class
  2. Can we pass ClusterManagerTaskThrottler#registerThrottlingKey as a Consumer to this class, something like
private BiConsumer<String, Boolean> taskThrottlingRegistration

and then

taskThrottlingRegistration.accept(ClusterManagerTaskThrottler.CREATE_INDEX_CLUSTER_MANAGER_TASK_KEY, true);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planing to expose the registering functionality via cluster service only, do you think we still need consumer? As it will be single and only way for registering throttling keys.

All the services has access to cluster service, so they can register it via cluster service. If we want to introduce the consumer, we will need to change constructor of almost all services to have this argument.

Thoughts?

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@shwetathareja shwetathareja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dhwanilpatel . just minor comments.
One point: we should catch any core task not registering itself via assertion that throttling key shouldnt be null during submit task or something.

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2022

Gradle Check (Jenkins) Run Completed with:

* @param taskThrottlingKey - throttling key for task
* @param throttlingEnabled - if throttling is enabled or not i.e. data node is performing retry over throttling exception or not.
*/
public ClusterManagerThrottlingKey(String taskThrottlingKey, boolean throttlingEnabled) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The very idea of this class was to make this an inner class Throttler with only privately accessible constructor so that the only way to get an instance was through registration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the constructor private, but we will need to expose the Default throttling key via static field from Throttler class as that will be needed for the tasks which are not on boarded on throttling framework. They will use this default key, and throttling will be bypassed for that default key.

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dhwanilpatel left a minor comment otherwise looks good.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2022

Gradle Check (Jenkins) Run Completed with:

…' into onboarding-pr

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2022

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

Codecov Report

Merging #4542 (58824e2) into feature/master-task-throttling (1212afd) will decrease coverage by 0.12%.
The diff coverage is 82.85%.

@@                         Coverage Diff                          @@
##             feature/master-task-throttling    #4542      +/-   ##
====================================================================
- Coverage                             70.71%   70.59%   -0.13%     
+ Complexity                            57644    57562      -82     
====================================================================
  Files                                  4665     4666       +1     
  Lines                                276614   276702      +88     
  Branches                              40297    40300       +3     
====================================================================
- Hits                                 195621   195350     -271     
- Misses                                64737    65090     +353     
- Partials                              16256    16262       +6     
Impacted Files Coverage Δ
...search/cluster/service/ClusterManagerTaskKeys.java 0.00% <0.00%> (ø)
...main/java/org/opensearch/script/ScriptService.java 71.65% <0.00%> (-0.45%) ⬇️
...oredscripts/TransportDeleteStoredScriptAction.java 55.55% <50.00%> (+5.55%) ⬆️
.../storedscripts/TransportPutStoredScriptAction.java 55.55% <50.00%> (+5.55%) ⬆️
...ing/delete/TransportDeleteDanglingIndexAction.java 9.58% <50.00%> (+1.13%) ⬆️
...dmin/indices/rollover/TransportRolloverAction.java 51.21% <50.00%> (-0.04%) ⬇️
...ster/metadata/MetadataCreateDataStreamService.java 61.53% <50.00%> (-0.31%) ⬇️
...arch/persistent/PersistentTasksClusterService.java 45.55% <50.00%> (-2.71%) ⬇️
...cluster/metadata/MetadataIndexTemplateService.java 82.84% <83.33%> (+1.00%) ⬆️
...h/cluster/service/ClusterManagerTaskThrottler.java 87.67% <87.09%> (-3.24%) ⬇️
... and 498 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Bukhtawar Bukhtawar merged commit 178d21d into opensearch-project:feature/master-task-throttling Oct 9, 2022
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 2, 2022
* Onboarding of few task types to throttling

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Nov 4, 2022
Basic Throttler Framework / Exponential Basic back off policy. 
Add basic thorttler/exponential backoff policy for retry/Defination o… #3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling #3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling #4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling #4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling #4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Nov 4, 2022
* Add basic thorttler/exponential backoff policy for retry/Defination of throttling exception (#3856)

* Corrected Java doc for Throttler

* Changed the default behaviour of Throttler to return Optional

* Removed generics from Throttler and used String as key

* Ignore backport / autocut / dependabot branches for gradle checks on push

* Master node changes for master task throttling (#3882)

* Data node changes for master task throttling (#4204)

* Onboarding of few task types to throttling (#4542)

* Fix timeout exception and Add Integ test for Master task throttling (#4588)

* Complete TODO for version change and removed unused classes(Throttler and Semaphore) (#4846)

* Remove V1 version from throttling testcase

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Nov 4, 2022
Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… #3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling #3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling #4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling #4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling #4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants