You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue with Lucene 9 reading indexes built by Lucene 8.
The exception is something along the lines of:
java.lang.IllegalStateException: unexpected docvalues type SORTED for field 'id' (expected=BINARY). Re-index with correct docvalues type.
The crux of the issue is the following:
In DefaultLuceneDocumentGenerator, we add the (external) docid as a DocValue:
// Store the collection docid.document.add(newStringField(IndexArgs.ID, id, Field.Store.YES));
// This is needed to break score ties by docid.document.add(newBinaryDocValuesField(IndexArgs.ID, newBytesRef(id)));
So that we can break ties by the docid, in SearchCollection we have a Sort:
from SortField.STRING_VAL javadoc: Sort using term values as Strings, but comparing by value (using String.compareTo) for all comparisons. This is typically slower than STRING, which uses ordinals to do the sorting.
The text was updated successfully, but these errors were encountered:
Addresses #1952 - add a flag -lucene8 that abandons consistent tie breaking,
so retrieval doesn't need to touch the docvalues. In the regression script, a
similar option --lucene8 allows the score matching to be more lenient.
+ Expose Lucene 8 backwards compatibility bindings in SimpleSearcher and SimpleImpactSearcher:
Basically, if we detect Lucene 8 indexes, we disable consistent tie-breaking, which depends on docvalues; see #1952
+ General cleanup (fixed code formatting in SimpleImpactSearcher)
+ Remove main in SimpleSearcher
+ Change to Python method names (snake_case)
I encountered an issue with Lucene 9 reading indexes built by Lucene 8.
The exception is something along the lines of:
The crux of the issue is the following:
In
DefaultLuceneDocumentGenerator
, we add the (external) docid as aDocValue
:So that we can break ties by the docid, in
SearchCollection
we have aSort
:The reason we do this is to ensure consistent tie breaking, as outlined in this SIGIR 2019 paper.
@tteofili indicated that this was a Lucene 8/Lucene 9 breaking change, due to this issue: fix SortedDocValues to no longer extend BinaryDocValues.
Reindexing with Lucene 9 fixes this issue.
Related, interesting tidbit:
The text was updated successfully, but these errors were encountered: