-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Search requests with size=0 & terminate_after sometimes return incorrectly hit count #10435
Comments
If |
@msfroh these tests are not flaky whenever Ref: OpenSearch/server/src/internalClusterTest/java/org/opensearch/search/simple/SimpleSearchIT.java Lines 301 to 311 in 489de2a
|
I have narrowed down the cause of this "issue" to a change introduced in this PR - #4043. (cc @reta) Specifically, this change: OpenSearch/server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java Lines 306 to 308 in 9b7a9d0
That PR brought in a Lucene feature (https://issues.apache.org/jira/browse/LUCENE-10620) that allows you to take advantage of Now, what is really interesting about this change is that when combined with these three (I think ALL three) search conditions - 1)
And, of course, that's exactly the setup for this test. But it is not a bug. I would say it's just incompatible with what you might expect when you set As to the non-deterministic manifestation of this issue, there are two things that pertain to this feature that seem to contribute to non-determinism. One is that the test uses |
Thanks a lot for digging this one in. The issue seems to manifest itself only during the concurrent search scenario which probably makes sense since we may go over segments concurrently (depending on random distribution). UPD:
|
It has nothing to do with concurrent search and I don't know if concurrent search makes it more likely. It has more to do with what's in a segment and how (Interestingly, it was your concurrent search blog that drew me to this. I thought I'd help with a small contribution.) |
Right, corrected myself in #10435 (comment), sadly we didn't have the tests for this specific case before so the issue stayed unnoticed |
@austintlee Thanks for looking into this!
I think there's some more discussion needed on what the correct "contract" for This is something that we've struggled with for concurrent search as well -- it's not straightforward to have concurrent search early terminate at the exact Curious to hear your thoughts on this too @sohami |
But it's not processing that many docs -- it's using the count returned from the
So, if the |
Sorry for delayed response. @msfroh Even though the Lets assume there is a single leaf with 21 matches. So the collectors for this test query is of type: Now assume there is an Aggregator in the mix like histo aggs what will happen is collector tree will be of form: I guess the question is in case of
|
@sohami Did you run that query with size = 0? Also, keep in mind that the specific instance of this problem doesn't always occur so I am not sure what conclusion we can draw from the response you posted above... I think we should document this behavior as best we can. I don't think anyone is proposing we make further code changes to change this behavior. |
IMO, we should just clearly document that the purpose of |
@austintlee Yes the query was with size=0. I just updated same query in test to include the aggregation. It will not occur if the optimization to fetch the hit count via
Thats what I am discussing here if we need to handle this case separately instead of just keeping the current behavior with proper documentation. Seems like current alignment is to document it properly
I guess then in documentation we should be very clear that |
"Independently" would be mischaracterization as they are still correlated. How about: The latter (spelling correction) controls collecting/processing number of the matched document whereas former is count of matching docs and the actual values of |
Apologies for the super delayed response here. I'm also in agreement that the best path forward is to document this difference. In addition to the reasons previously discussed, this "issue" has been present since OpenSearch 2.2.0 and there doesn't seem to have been any complaints about this so adding an explanation to the docs seems sufficient. I'm happy to close out this issue with PR #10836 from @austintlee -- I will create a docs issue to follow up on the documentation updates. |
Describe the bug
When both
size=0
andterminate_after
parameters are provided to a search request, the incorrect total hit count is sometimes returned. This was discovered as a part of the investigation for a flakey test, see #9946 (comment)To Reproduce
Set size=0 in the search request in the below test.
Expected behavior
The above test should always pass
The text was updated successfully, but these errors were encountered: