SpanOrQuery uses IDFs of failed subqueries in score calculation. #13796

tkarampAlpha · 2024-09-16T15:01:44Z

Description

It seems that for SpanOrQuery IDF of terms belonging in subqueries that will not match a given document, will affect said document's score.

I have observed this through on which I have 3 documents:

doc1: 
    field: something
doc2:
    field: nothing
doc3: 
    field: anything

And I issue the following query:

spanOr([Contents:something, Contents:nothing])

If you check at the score explanation you will notice that in both document's score the idf of both terms affects it even though for each document only one matches.

This is an example of the explanation of the first document's score:

3.9616547 = weight(spanOr([Contents:something, Contents:nothing]) in 0) [AsBM25Similarity], result of:
  3.9616547 = score(freq=1.0), computed as boost * idf * tf from:
    51.0 = boost
    3.9616585 = idf, sum of:
      1.9808292 = idf for term nothing , computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
        1 = docFreq
        3 = docCount
      1.9808292 = idf for term something , computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
        1 = docFreq
        3 = docCount
    0.019607842 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
      1.0 = phraseFreq=1.0
      50.0 = k1, term saturation parameter
      0.0 = b, length normalization parameter
      1.0 = dl, length of field
      2.0 = avgdl, average length of field

Version and environment details

lucene 9.7.0 through solr 9.3.0

The text was updated successfully, but these errors were encountered:

tkarampAlpha added the type:bug label Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpanOrQuery uses IDFs of failed subqueries in score calculation. #13796

SpanOrQuery uses IDFs of failed subqueries in score calculation. #13796

tkarampAlpha commented Sep 16, 2024

SpanOrQuery uses IDFs of failed subqueries in score calculation. #13796

SpanOrQuery uses IDFs of failed subqueries in score calculation. #13796

Comments

tkarampAlpha commented Sep 16, 2024

Description

Version and environment details