-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support task resource tracking in OpenSearch #2639
Support task resource tracking in OpenSearch #2639
Conversation
d023569
to
d4a085e
Compare
❌ Gradle Check failure d023569a4cd8ec1f4e820d56a141fcfd7dc47489 |
❌ Gradle Check failure d4a085e20f3c5a9a9aa91ed64f4a39446d651bec |
d4a085e
to
8ac708a
Compare
@dblock @andrross @Bukhtawar Could you please review this PR. Thanks |
❌ Gradle Check failure 8ac708a5d59beea160f13dd77ca26d0398b84341 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tushar. The changes look good overall. I'll continue with the full review in sometime but below are some early comments I have.
listener.onFailure(validationException); | ||
return; | ||
} | ||
ThreadContext.StoredContext storedContext = taskManager.addTaskIdInThreadContext(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add task in thread context centrally as a part of task creation, instead of doing it at multiple places after task creation, it could lead to maintenance overhead overtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done as per the ThreadContext's usage documentation which is something like :
- Clone/Stash existing thread context
- Add new fields to the new thread context.
- Execute the operation (or take task to a different thread)
- Restore the original thread context.
If we want to do it in register method then we would have to face challenges of making ThreadContext mutable which today throws exception and will also need to find a common place to remove the header from the thread context so that some other request doesn't use this just because we missed to remove it from thread context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets just call out this works for tasks that forks off on a separate thread(non-network thread)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tushar-kharbanda72 I would like to back @Bukhtawar concerns, manipulating thread context in a such away is not desirable (cause essentially someone down the line could override it again). The suggestion I have in mind is to introduce the distinction between Runnable
(no task context) and TaskRunnable extends Runnable
(with Task ID as final property). In this case, the taskId could be extracted by thread pool and baked into the thread local context from the start (by checking if Runnable
is instance of TaskRunnable
). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta Agree with the problem you mentioned of Task Id being overridden. The solution for that I think could be to always preserve Task Id when a thread context stash is called and always make sure it makes it to the new context. Thoughts on this?
Yes having taskId available in TaskRunnable without ThreadContext is desirable but I don't seem to currently have a way to get that information for all use cases in OpenSearch without having changes at multiple places in codebase. I believe this essentially translates to having a task id available where ever we are submitting any runnable to an executor. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tushar-kharbanda72 yes, it is possible to manage task id transfer using stash(), but fundamental problem here (in my opinion) - the thread context should not be manipulated through TaskManager
from the outside. Yes, it will introduce larger scope of changes, the benefit - less dependency on thread context (black box), explicit link between task and its execution model, just my 5 cents, @Bukhtawar do you have an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fundamental problem here (in my opinion) - the thread context should not be manipulated through TaskManager from the outside
I agree. This is essentially using the thread context to stash a global variable and is likely to lead to pain in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta @andrross Currently coupling the thread context update with when a task gets registered in resource tracking service as both together made more sense.
Having task id set as a member in the runnable is an ideal case scenario but comes with it's own sets of challenges:
- Submitting a runnable will not be very flexible. We would need Task Id available as a function param at every point throughout the code base wherever we need to submit a runnable.
- If a blocking sync child action gets executed on the same thread - this would essentially mean updating the runnable's task id in some way and reverting it back as soon as it finishes.
- For new features folks would need to handle the overlapping case of making task id available and I see this would increase the number of params they handle in the function.
I see the benefits of the suggested approach but it also comes with the above mentioned downsides and these could be even more when we deep dive further.
Also, similar approach is used for request tracing X-Opaque-Id
in OpenSearch which I see a more critical feature for some users.
@Bukhtawar Would like to hear your thoughts on this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tushar-kharbanda72 sorry for the delay, I certainly agree it is challenging to have robust and reliable context propagation support but this is also very important to avoid any surprises by opening the gate for (arbitrary) context manipulations. The approach could be generalized by eliminating the focus on task to abstract runnable with contextual setup: in this case the thread context could be established by runnable (fe, associated with task or anything else) before hand.
There are quite a few examples of contextual runnables like ContextPreservingAbstractRunnable
and ContextPreservingRunnable
.
X-Opaque-Id
is also a good example of context propagation but it is passed through requestHeaders
, may be we could extend ThreadContextStruct
with non-headers key/value pairs, task id being one of them. I will try to prototype the possible implementation(s) shortly and share with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta Created an issue to discuss further on this #2819 (comment). We can use this to close the final change required for making task id available to runnable.
I believe this can be added in a follow up PR as it's not changing the core logic and just the way runnable knows about the Task. Do let me know your thoughts.
Thanks for jumping in @Bukhtawar! Will leave this to you for now until you're happy with the PR and we can have more 👀 on it later. |
❌ Gradle Check failure ba1116110b2dad43043138c9f199dd2a17b1f693 |
❌ Gradle Check failure 61a6e76d079d0eed9e9ca669d4268a6427dd5805 |
61a6e76
to
8a18752
Compare
❌ Gradle Check failure 8a187522ac9a20b0f90831048e095649fa5adda8 |
❌ Gradle Check failure a0e291e588a3e4c6a76bc1551f97924d1ee230e1 |
server/src/main/java/org/opensearch/tasks/TaskResourceTrackingService.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/threadpool/RunnableTaskListenerFactory.java
Outdated
Show resolved
Hide resolved
❌ Gradle Check failure 36d2a5238399791e3e1bae339cecaea400a3c20b |
36d2a52
to
b92f5e3
Compare
❌ Gradle Check failure b92f5e3aea8b947cc35049b386167244e4c5e246 |
❌ Gradle Check failure da2220dc2b957a1ec3420a400612adc108d476fe |
da2220d
to
f6d43f9
Compare
❌ Gradle Check failure f6d43f905da4194e190fcbd35d6349adce96b9e4 |
f6d43f9
to
4b71cba
Compare
❌ Gradle Check failure 4b71cbaef54f81c9bf9298d5ecb1525a552210b4 |
❌ Gradle Check failure 8c59e526942cfc413733d3b6c5a6db7a2927417c |
❌ Gradle Check failure cbcf3c525bf516fb7164f0221491a7b25c1f96ec |
✅ Gradle Check success 4eb49c56c1c155b30f374e05bbd24efea5645283 |
This was reverted in #3046 |
@dblock can you point me to the build failure links? |
Looks like this is the only assertion that is tripping
|
@Bukhtawar There are a few recent mentions of this in issues linked off #3046 |
@dblock Will take a look at assertion failures and resolve the issue this week. |
* Add Task id in Thread Context Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add resource tracking update support for tasks Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * List tasks action support for task resource refresh Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Handle task unregistration case on same thread Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add lazy initialisation for RunnableTaskExecutionListener Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Segregate resource tracking logic to a separate service. Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Check for running threads during task unregister Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Moved thread context logic to resource tracking service Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * preserve task id in thread context even after stash Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add null check for resource tracking service Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Tracking service tests and minor refactoring Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Preserve task id fix with test Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Minor test changes and Task tracking call update Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Fix Auto Queue executor method's signature Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Make task runnable task listener factory implement consumer Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Use reflection for ThreadMXBean Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Formatting Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Replace RunnableTaskExecutionListenerFactory with AtomicReference Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Revert "Use reflection for ThreadMXBean" This reverts commit cbcf3c525bf516fb7164f0221491a7b25c1f96ec. Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Suppress Warning related to ThreadMXBean Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add separate method for task resource tracking supported check Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Enabled setting by default Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add debug logs for stale context id Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Remove hardcoded task overhead in tests Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Bump stale task id in thread context log level to warn Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Improve assertions and logging Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> Co-authored-by: Tushar Kharbanda <tkharban@amazon.com>
Reopens changes from opensearch-project#2639 (reverted in opensearch-project#3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task. Changes since opensearch-project#2639: * Replaced the usage of AutoQueueAdjustingExecutorBuilder with ResizableExecutorBuilder * Resolved merge conflicts * Fixed broken tests Signed-off-by: Ketan Verma <ketan9495@gmail.com>
* Support task resource tracking in OpenSearch * Reopens changes from #2639 (reverted in #3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task. * Fixed a race-condition when Task is unregistered before its threads are stopped * Improved error handling and simplified task resource tracking completion listener * Avoid registering listeners on already completed tasks Signed-off-by: Ketan Verma <ketan9495@gmail.com>
* Support task resource tracking in OpenSearch * Reopens changes from opensearch-project#2639 (reverted in opensearch-project#3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task. * Fixed a race-condition when Task is unregistered before its threads are stopped * Improved error handling and simplified task resource tracking completion listener * Avoid registering listeners on already completed tasks Signed-off-by: Ketan Verma <ketan9495@gmail.com>
* [Backport 2.x] Support task resource tracking in OpenSearch * Reopens changes from #2639 (reverted in #3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task. * Fixed a race-condition when Task is unregistered before its threads are stopped * Improved error handling and simplified task resource tracking completion listener * Avoid registering listeners on already completed tasks Signed-off-by: Ketan Verma <ketan9495@gmail.com>
* Add Task id in Thread Context Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add resource tracking update support for tasks Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * List tasks action support for task resource refresh Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Handle task unregistration case on same thread Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add lazy initialisation for RunnableTaskExecutionListener Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Segregate resource tracking logic to a separate service. Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Check for running threads during task unregister Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Moved thread context logic to resource tracking service Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * preserve task id in thread context even after stash Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add null check for resource tracking service Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Tracking service tests and minor refactoring Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Preserve task id fix with test Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Minor test changes and Task tracking call update Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Fix Auto Queue executor method's signature Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Make task runnable task listener factory implement consumer Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Use reflection for ThreadMXBean Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Formatting Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Replace RunnableTaskExecutionListenerFactory with AtomicReference Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Revert "Use reflection for ThreadMXBean" This reverts commit cbcf3c525bf516fb7164f0221491a7b25c1f96ec. Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Suppress Warning related to ThreadMXBean Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add separate method for task resource tracking supported check Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Enabled setting by default Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Add debug logs for stale context id Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Remove hardcoded task overhead in tests Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Bump stale task id in thread context log level to warn Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> * Improve assertions and logging Signed-off-by: Tushar Kharbanda <tushar.kharbanda72@gmail.com> Co-authored-by: Tushar Kharbanda <tkharban@amazon.com>
Description
Issues Resolved
#1179
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.