Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master node changes for master task throttling #3882

Conversation

dhwanilpatel
Copy link
Contributor

@dhwanilpatel dhwanilpatel commented Jul 13, 2022

Master node changes for master task throttling

Signed-off-by: Dhwanil Patel dhwanip@amazon.com

Description

This is one of multiple PR planned for master task throttling. In this PR we are making changes in MasterTaskThrottler and MasterService.

MasterTaskThrottler: It contains the settings of throttling and extends Throttler class for the use case of task throttling.
MasterService: Plugged the MasterTaskThrottler on submitting new tasks.

We have introduced the throttling settings as well, using which we can set the throttling limits for different tasks. By default throttling will be disabled and will be enabled on those task for which it is being configured via dynamic setting. Below is sample on how settings can be updated for throttling.

curl -XPUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' 
{ 
  "persistent": 
    { 
      "master.throttling.thresholds" : 
      { 
        "put-mapping" : 
        { 
          "value" : 10 
        }
      } 
    } 
}'

Issues Resolved

Relates to : #479

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@dhwanilpatel dhwanilpatel requested review from a team and reta as code owners July 13, 2022 14:53
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines 209 to 212
} catch (Exception e) {
masterTaskThrottler.release(masterThrottlingKey, tasks.size());
throw e;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acquire and release seems too complicated and seems a maintenance overhead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We acquire permits before task goes into queue and release the permit when task is removed from the queue.

We acquire permits at one place, while submitting the tasks.
We release from three places.

  1. If submit task is failed due to some exception (Since we already acquire permit and submit is failing so we need to release it).
  2. Release before execution (Task is removed from queue and went into execution).
  3. Task is timed out in queue(Task is timed out in queue, so master will throw exception to data nodes. Releasing as task is removed from queue)

I will add this logic of acquire/release in java doc as well for future reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @Bukhtawar if we can simplify this logic further without using permits and semaphores. @dhwanilpatel I have a suggestion, let me know if that makes sense

In the ClusterStateTaskExecutor you are already plumbing the throttling key and maintaining a ConcurrentHashMap for semaphores. We can change the ConcurrentHashMap with just count of tasks in the queue as value. You can use computeIfPresent to atomically update count and if the count exceeds the limit, then throw exception else update the count in the remapping function provided in computeIfPresent. The 3 places you mentioned, we still need them to reduce the count in this approach as well but we will not need semaphores.
The count map can also be used for publishing stats on total tasks waiting in queue per task type at any time (even when throttling has not kicked in).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @shwetathareja ,

ConcurrentHashMap's ComputeIfPresent acquires lock on segment of the hashes, so it can create the congestion for multiple task type as for each submit call we will call this to increment the count. Submission of one task type can affect the other task type. This can happen even when throttling is disabled for those task type.

While for Semaphore, it will be on individual task type level only and only if throttling is enabled for that task type.

Do you see any concern with the Semaphores? As with either approach we will need to decrease count/permit from these three places.

Let me know your thoughts?

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@dhwanilpatel
Copy link
Contributor Author

dhwanilpatel commented Aug 9, 2022

@Bukhtawar / @shwetathareja / @gbbafna I have made changes for maintaining ConcurrentHashMap for maintaining count per task type and making throttling decision based on count.
I have created below Listener and MasterTaskThrottler will be implementing it and update the count per task type.

    void onSubmit(List<? extends TaskBatcher.BatchedTask> tasks);
    void onProcessed(List<? extends TaskBatcher.BatchedTask> tasks);
    void onSubmitFailure(List<? extends TaskBatcher.BatchedTask> tasks);

We can keep this Listener at the individual task level into BatchedTask, but I dont see any usecase where we would want to perform various activity whenever individual task gets submit/timeout/execute. If anyone has any use case in mind, please share it.

To keep it simple now, I have kept the Listener at the TaskBatcher level, task batcher will call relevant methods based on activity it performs on tasks.

I have made the changes for it, and updating the PR to get the early feedback on approach.
This will break few of the UTs, I will fix the UTs in upcoming revisions.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2022

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@gbbafna gbbafna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks much simpler. Thanks @dhwanilpatel .

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines 83 to 85
*/
if (masterService.state().nodes().getMinNodeVersion().compareTo(Version.V_3_0_0) < 0) {
throw new IllegalArgumentException("All the nodes in cluster should be on version later than or equal to 3.0.0");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there are restriction on having all node above 3.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to make it to the version to which PR is going to be merged, as of now kept it as Current Version(i.e 3.0).
I have added TODO for changing the version as well.

Once we raise final PR against master branch will change the version in it and take care of TODO.

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dhwanilpatel
Copy link
Contributor Author

start gradle check

throttledTasksCount.get(type).inc(permits);
}

public long getThrottlingCount(String type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use an API to get throttled metrics along with the task type. It would be really helpful to tune this up as an end user . However, we can do it once the stats API is present as well .

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

…change-pr

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

burck1 and others added 5 commits August 24, 2022 11:27
…bat manager" (opensearch-project#4289)

* [BUG] Update opensearch-service-x64.exe parameters to //ES for Execute Service. Update opensearch-service-mgr.exe parameters to //ES for Edit Service. Add code comments for the Apache Commons Daemon.

Signed-off-by: Alex Burck <me@alexburck.com>

* update changelog with pull request link

Signed-off-by: Alex Burck <me@alexburck.com>

Signed-off-by: Alex Burck <me@alexburck.com>
* Removing dead code in RecoveryTarget.

This code in RecoveryTarget is not invoked, all of these methods are implemented by the parent ReplicationTarget with the same behavior.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR Comments.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Signed-off-by: Marc Handalian <handalm@amazon.com>
* Update changelog contribution guide

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>

* Fix reference to pull request

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
bharath-techie and others added 7 commits September 1, 2022 10:58
…vel client (opensearch-project#4064)

* Create and delete PIT search rest layer changes

Signed-off-by: Bharathwaj G <bharath78910@gmail.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
…search-project#4354)

* Bump xmlbeans from 5.1.0 to 5.1.1 in /plugins/ingest-attachment

Bumps xmlbeans from 5.1.0 to 5.1.1.

---
updated-dependencies:
- dependency-name: org.apache.xmlbeans:xmlbeans
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updating SHAs

Signed-off-by: dependabot[bot] <support@github.com>

* Update changelog

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
…nts (opensearch-project#4352)

Overload `generateHistoryOnReplica` to be able to generate only a specific `Engine.Operation.TYPE` operations as required by the `testUpdateSegments` test

Signed-off-by: Rabi Panda <adnapibar@gmail.com>

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
* Added bwc version 2.2.2

* Add changelog

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
Co-authored-by: opensearch-ci-bot <opensearch-ci-bot@users.noreply.github.com>
Co-authored-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Gradle Check (Jenkins) Run Completed with:

@shwetathareja shwetathareja merged commit e231136 into opensearch-project:feature/master-task-throttling Sep 2, 2022
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 2, 2022
* Master node changes for master task throttling

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Nov 4, 2022
Basic Throttler Framework / Exponential Basic back off policy. 
Add basic thorttler/exponential backoff policy for retry/Defination o… #3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling #3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling #4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling #4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling #4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
dhwanilpatel added a commit to dhwanilpatel/OpenSearch-1 that referenced this pull request Nov 4, 2022
…t#4986)

Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… opensearch-project#3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling opensearch-project#3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling opensearch-project#4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling opensearch-project#4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling opensearch-project#4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Nov 4, 2022
* Add basic thorttler/exponential backoff policy for retry/Defination of throttling exception (#3856)

* Corrected Java doc for Throttler

* Changed the default behaviour of Throttler to return Optional

* Removed generics from Throttler and used String as key

* Ignore backport / autocut / dependabot branches for gradle checks on push

* Master node changes for master task throttling (#3882)

* Data node changes for master task throttling (#4204)

* Onboarding of few task types to throttling (#4542)

* Fix timeout exception and Add Integ test for Master task throttling (#4588)

* Complete TODO for version change and removed unused classes(Throttler and Semaphore) (#4846)

* Remove V1 version from throttling testcase

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Nov 4, 2022
Basic Throttler Framework / Exponential Basic back off policy.
Add basic thorttler/exponential backoff policy for retry/Defination o… #3527
Changes required in Master node to perform throttling.
Master node changes for master task throttling #3882
Changes required in Data node to perform retry on throttling.
Data node changes for master task throttling #4204
Provide support for all task type in throttling framework.
Onboarding of few task types to throttling #4542
Integration Tests (Fix timeout exception and Add Integ test for Master task throttling #4588

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.