-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Master node changes for master task throttling #3882
Master node changes for master task throttling #3882
Conversation
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/cluster/ClusterStateTaskExecutor.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterThrottlingStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterTaskThrottler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterService.java
Outdated
Show resolved
Hide resolved
} catch (Exception e) { | ||
masterTaskThrottler.release(masterThrottlingKey, tasks.size()); | ||
throw e; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acquire and release seems too complicated and seems a maintenance overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We acquire permits before task goes into queue and release the permit when task is removed from the queue.
We acquire permits at one place, while submitting the tasks.
We release from three places.
- If submit task is failed due to some exception (Since we already acquire permit and submit is failing so we need to release it).
- Release before execution (Task is removed from queue and went into execution).
- Task is timed out in queue(Task is timed out in queue, so master will throw exception to data nodes. Releasing as task is removed from queue)
I will add this logic of acquire/release in java doc as well for future reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @Bukhtawar if we can simplify this logic further without using permits and semaphores. @dhwanilpatel I have a suggestion, let me know if that makes sense
In the ClusterStateTaskExecutor you are already plumbing the throttling key and maintaining a ConcurrentHashMap for semaphores. We can change the ConcurrentHashMap with just count of tasks in the queue as value. You can use computeIfPresent to atomically update count and if the count exceeds the limit, then throw exception else update the count in the remapping function provided in computeIfPresent. The 3 places you mentioned, we still need them to reduce the count in this approach as well but we will not need semaphores.
The count map can also be used for publishing stats on total tasks waiting in queue per task type at any time (even when throttling has not kicked in).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion @shwetathareja ,
ConcurrentHashMap's ComputeIfPresent acquires lock on segment of the hashes, so it can create the congestion for multiple task type as for each submit call we will call this to increment the count. Submission of one task type can affect the other task type. This can happen even when throttling is disabled for those task type.
While for Semaphore, it will be on individual task type level only and only if throttling is enabled for that task type.
Do you see any concern with the Semaphores? As with either approach we will need to decrease count/permit from these three places.
Let me know your thoughts?
server/src/main/java/org/opensearch/cluster/service/MasterTaskThrottler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterThrottlingStats.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java
Show resolved
Hide resolved
server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/cluster/service/MasterThrottlingStats.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@Bukhtawar / @shwetathareja / @gbbafna I have made changes for maintaining
We can keep this Listener at the individual task level into To keep it simple now, I have kept the Listener at the I have made the changes for it, and updating the PR to get the early feedback on approach. |
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks much simpler. Thanks @dhwanilpatel .
server/src/main/java/org/opensearch/cluster/service/TaskBatcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterTaskThrottler.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
*/ | ||
if (masterService.state().nodes().getMinNodeVersion().compareTo(Version.V_3_0_0) < 0) { | ||
throw new IllegalArgumentException("All the nodes in cluster should be on version later than or equal to 3.0.0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there are restriction on having all node above 3.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make it to the version to which PR is going to be merged, as of now kept it as Current Version(i.e 3.0).
I have added TODO for changing the version as well.
Once we raise final PR against master branch will change the version in it and take care of TODO.
server/src/main/java/org/opensearch/cluster/service/MasterTaskThrottler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/TaskBatcherListener.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterTaskThrottler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterTaskThrottler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterTaskThrottler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterTaskThrottler.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/opensearch/cluster/service/MasterTaskThrottlerTests.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
start gradle check |
throttledTasksCount.get(type).inc(permits); | ||
} | ||
|
||
public long getThrottlingCount(String type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use an API to get throttled metrics along with the task type. It would be really helpful to tune this up as an end user . However, we can do it once the stats API is present as well .
Gradle Check (Jenkins) Run Completed with:
|
…change-pr Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
…bat manager" (opensearch-project#4289) * [BUG] Update opensearch-service-x64.exe parameters to //ES for Execute Service. Update opensearch-service-mgr.exe parameters to //ES for Edit Service. Add code comments for the Apache Commons Daemon. Signed-off-by: Alex Burck <me@alexburck.com> * update changelog with pull request link Signed-off-by: Alex Burck <me@alexburck.com> Signed-off-by: Alex Burck <me@alexburck.com>
* Removing dead code in RecoveryTarget. This code in RecoveryTarget is not invoked, all of these methods are implemented by the parent ReplicationTarget with the same behavior. Signed-off-by: Marc Handalian <handalm@amazon.com> * PR Comments. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com>
* Update changelog contribution guide Signed-off-by: Kunal Kotwani <kkotwani@amazon.com> * Fix reference to pull request Signed-off-by: Kunal Kotwani <kkotwani@amazon.com> Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: Kunal Kotwani <kkotwani@amazon.com> Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…vel client (opensearch-project#4064) * Create and delete PIT search rest layer changes Signed-off-by: Bharathwaj G <bharath78910@gmail.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
…search-project#4354) * Bump xmlbeans from 5.1.0 to 5.1.1 in /plugins/ingest-attachment Bumps xmlbeans from 5.1.0 to 5.1.1. --- updated-dependencies: - dependency-name: org.apache.xmlbeans:xmlbeans dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Updating SHAs Signed-off-by: dependabot[bot] <support@github.com> * Update changelog Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
…nts (opensearch-project#4352) Overload `generateHistoryOnReplica` to be able to generate only a specific `Engine.Operation.TYPE` operations as required by the `testUpdateSegments` test Signed-off-by: Rabi Panda <adnapibar@gmail.com> Signed-off-by: Rabi Panda <adnapibar@gmail.com>
* Added bwc version 2.2.2 * Add changelog Signed-off-by: Kunal Kotwani <kkotwani@amazon.com> Signed-off-by: Kunal Kotwani <kkotwani@amazon.com> Co-authored-by: opensearch-ci-bot <opensearch-ci-bot@users.noreply.github.com> Co-authored-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/cluster/service/ClusterManagerTaskThrottler.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
…pensearch-project/OpenSearch into feature/master-task-throttling
Gradle Check (Jenkins) Run Completed with:
|
e231136
into
opensearch-project:feature/master-task-throttling
* Master node changes for master task throttling Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… #3527 Changes required in Master node to perform throttling. Master node changes for master task throttling #3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling #4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling #4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling #4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
…t#4986) Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527 Changes required in Master node to perform throttling. Master node changes for master task throttling opensearch-project#3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling opensearch-project#4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling opensearch-project#4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
* Add basic thorttler/exponential backoff policy for retry/Defination of throttling exception (#3856) * Corrected Java doc for Throttler * Changed the default behaviour of Throttler to return Optional * Removed generics from Throttler and used String as key * Ignore backport / autocut / dependabot branches for gradle checks on push * Master node changes for master task throttling (#3882) * Data node changes for master task throttling (#4204) * Onboarding of few task types to throttling (#4542) * Fix timeout exception and Add Integ test for Master task throttling (#4588) * Complete TODO for version change and removed unused classes(Throttler and Semaphore) (#4846) * Remove V1 version from throttling testcase Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Basic Throttler Framework / Exponential Basic back off policy. Add basic thorttler/exponential backoff policy for retry/Defination o… #3527 Changes required in Master node to perform throttling. Master node changes for master task throttling #3882 Changes required in Data node to perform retry on throttling. Data node changes for master task throttling #4204 Provide support for all task type in throttling framework. Onboarding of few task types to throttling #4542 Integration Tests (Fix timeout exception and Add Integ test for Master task throttling #4588 Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Master node changes for master task throttling
Signed-off-by: Dhwanil Patel dhwanip@amazon.com
Description
This is one of multiple PR planned for master task throttling. In this PR we are making changes in MasterTaskThrottler and MasterService.
MasterTaskThrottler: It contains the settings of throttling and extends Throttler class for the use case of task throttling.
MasterService: Plugged the MasterTaskThrottler on submitting new tasks.
We have introduced the throttling settings as well, using which we can set the throttling limits for different tasks. By default throttling will be disabled and will be enabled on those task for which it is being configured via dynamic setting. Below is sample on how settings can be updated for throttling.
Issues Resolved
Relates to : #479
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.