-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter shards for sliced search at coordinator #16771
base: main
Are you sure you want to change the base?
Conversation
❌ Gradle check result for 2d2fd05: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
2d2fd05
to
541979e
Compare
❌ Gradle check result for 541979e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
541979e
to
b4aaa2f
Compare
❌ Gradle check result for b4aaa2f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 1680b9b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
1680b9b
to
8842514
Compare
❌ Gradle check result for 8842514: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
8842514
to
eadaabd
Compare
❌ Gradle check result for eadaabd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16771 +/- ##
============================================
+ Coverage 72.19% 72.22% +0.02%
- Complexity 65208 65217 +9
============================================
Files 5297 5297
Lines 303324 303351 +27
Branches 43913 43926 +13
============================================
+ Hits 218999 219098 +99
+ Misses 66367 66261 -106
- Partials 17958 17992 +34 ☔ View full report in Codecov by Sentry. |
// Filter the returned shards for the given slice | ||
CollectionUtil.timSort(indexIterators); | ||
for (int i = 0; i < indexIterators.size(); i++) { | ||
if (slice.shardMatches(i, indexIterators.size())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should be using shard id instead of i
? Do all shards always participate in search? (or some of them could be filtered out earlier)?
if (slice.shardMatches(indexIterators.get(i).shardId().id(), indexIterators.size())) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the challenge here is that it's not necessarily the shard ID.
If you've specified routing preferences or something like _shards:2,4,8
, then the shard iterators returned from computeTargetedShards
will only contain the shards corresponding to the routing preference or the shard filter. The modulo logic for slice matching pretends those are the "full universe" of shards for slicing purposeses.
Essentially, I'm mimicking the shard-level behavior that happens here:
OpenSearch/server/src/main/java/org/opensearch/search/slice/SliceBuilder.java
Lines 245 to 253 in 0d54c16
// remap the original shard id with its index (position) in the sorted shard iterator. | |
for (ShardIterator it : group) { | |
assert it.shardId().getIndex().equals(request.shardId().getIndex()); | |
if (request.shardId().equals(it.shardId())) { | |
shardId = ord; | |
break; | |
} | |
++ord; | |
} |
shardId
s).
IMO, the approach I used (where it's always ordinals, which may happen to be the same as shard IDs) is slightly less hacky.
I should add a comment to explain why it needs to be ordinals and not shard IDs, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it makes sense but SliceBuilder
should be using something like shardIndex
than? Cuz now it uses shardId
explicitly:
public boolean shardMatches(int shardId, int numShards) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair. I made it shardId
before I properly understood the existing SliceBuilder
logic (which overwrites the real shardId
with ord
).
While I'm there I'm going to rename the variable in the method quoted above -- using a variable called shardId
when you really mean shardIndex
is ugly.
❌ Gradle check result for c7abf62: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for f29e111: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Prior to this commit, a sliced search would fan out to every shard, then apply a MatchNoDocsQuery filter on shards that don't correspond to the current slice. This still creates a (useless) search context on each shard for every slice, though. For a long-running sliced scroll, this can quickly exhaust the number of available scroll contexts. This change avoids fanning out to all the shards by checking at the coordinator if a shard is matched by the current slice. This should reduce the number of open scroll contexts to max(numShards, numSlices) instead of numShards * numSlices. Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
f29e111
to
6a2de32
Compare
❌ Gradle check result for 6a2de32: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for f29e111: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Prior to this commit, a sliced search would fan out to every shard, then apply a MatchNoDocsQuery filter on shards that don't correspond to the current slice. This still creates a (useless) search context on each shard for every slice, though. For a long-running sliced scroll, this can quickly exhaust the number of available scroll contexts.
This change avoids fanning out to all the shards by checking at the coordinator if a shard is matched by the current slice. This should reduce the number of open scroll contexts to max(numShards, numSlices) instead of numShards * numSlices.
Related Issues
Related to #16289
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.