-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinator can return partial results after the timeout when allow_partial_search_results is true #16681
base: main
Are you sure you want to change the base?
Conversation
3b9454f
to
27eed03
Compare
27eed03
to
61d84d1
Compare
❌ Gradle check result for 61d84d1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
long leftTimeMills; | ||
if (queryPhase) { | ||
// it's costly in query phase. | ||
leftTimeMills = task.queryPhaseTimeout() - (System.currentTimeMillis() - task.startTimeMills()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation behind the queryPhaseTimeoutPercentage
concept? I think it's going to depend on the query and the setup whether query or fetch phase takes longer and it doesn't seem super intuitive for a user to understand how to use this. For example a query that matches a lot of sparse documents using searchable snapshots might spend much longer in the fetch phase while a query that performs complex aggregations might spend a lot longer in the query phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope to reserve some time for the subsequent phase as a backup measure, to ensure each stage can be allocated a certain amount of time. Of course, if the previous stage takes a very short time, it won't affect the remaining time available for the subsequent phases either.
If no such reservation is made, and a shard is blocked in query phase and uses up all the time, even if it returns after the timeout, there won't be any executable time left for the subsequent stages, and the timeout would be meaningless in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm should we have separate timeouts for the coordinator and the shard level search tasks then? I still think it's pretty unintuitive to use a % like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jed326 , thanks for your reply, how about importing a new settings coordinator_timeout
?
e3dda9a
to
f2cb9f7
Compare
❌ Gradle check result for f2cb9f7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
f2cb9f7
to
3e56548
Compare
❌ Gradle check result for 3e56548: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
3e56548
to
c55f45e
Compare
❌ Gradle check result for c55f45e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
c55f45e
to
e2305c7
Compare
❌ Gradle check result for e2305c7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
e2305c7
to
b6802ae
Compare
❌ Gradle check result for b6802ae: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
org.opensearch.search.basic.SearchWithRandomExceptionsIT.testRandomExceptions #15828 |
8a2e7cd
to
d32aac8
Compare
❌ Gradle check result for d32aac8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
d32aac8
to
0bb55bc
Compare
❌ Gradle check result for 0bb55bc: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…artial_search_results is true Signed-off-by: kkewwei <kewei.11@bytedance.com> Signed-off-by: kkewwei <kkewwei@163.com>
0bb55bc
to
7e5b243
Compare
❕ Gradle check result for 7e5b243: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16681 +/- ##
============================================
+ Coverage 72.05% 72.10% +0.05%
- Complexity 65183 65201 +18
============================================
Files 5318 5318
Lines 303993 304030 +37
Branches 43990 43997 +7
============================================
+ Hits 219028 219214 +186
+ Misses 67046 66889 -157
- Partials 17919 17927 +8 ☔ View full report in Codecov by Sentry. |
Signed-off-by: kkewwei <kewei.11@bytedance.com>
❌ Gradle check result for 3a7fd6e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
In query phase, the coordinate concurrently search each shard, If any shard is blocked or responds very slowly, the coordination node will be stuck even if the timeout is set.
The pr supports timeout waiting, if the timeout is exceeded, the coordinator considers the shard as failed and gos on the fetch phase.
Related Issues
Resolves #817 (comment)
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.