Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpanOrQuery uses IDFs of failed subqueries in score calculation. #13796

Open
tkarampAlpha opened this issue Sep 16, 2024 · 0 comments
Open

SpanOrQuery uses IDFs of failed subqueries in score calculation. #13796

tkarampAlpha opened this issue Sep 16, 2024 · 0 comments
Labels

Comments

@tkarampAlpha
Copy link

Description

It seems that for SpanOrQuery IDF of terms belonging in subqueries that will not match a given document, will affect said document's score.

I have observed this through on which I have 3 documents:

doc1: 
    field: something
doc2:
    field: nothing
doc3: 
    field: anything

And I issue the following query:

spanOr([Contents:something, Contents:nothing])

If you check at the score explanation you will notice that in both document's score the idf of both terms affects it even though for each document only one matches.

This is an example of the explanation of the first document's score:

3.9616547 = weight(spanOr([Contents:something, Contents:nothing]) in 0) [AsBM25Similarity], result of:
  3.9616547 = score(freq=1.0), computed as boost * idf * tf from:
    51.0 = boost
    3.9616585 = idf, sum of:
      1.9808292 = idf for term nothing , computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
        1 = docFreq
        3 = docCount
      1.9808292 = idf for term something , computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
        1 = docFreq
        3 = docCount
    0.019607842 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
      1.0 = phraseFreq=1.0
      50.0 = k1, term saturation parameter
      0.0 = b, length normalization parameter
      1.0 = dl, length of field
      2.0 = avgdl, average length of field

Version and environment details

lucene 9.7.0 through solr 9.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant