Fuzziness works incorrectly when using boolean similarity #75652
Labels
>bug
priority:normal
A label for assessing bug priority to be used by ES engineers
:Search Relevance/Search
Catch all for Search Relevance
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
Elasticsearch version (
bin/elasticsearch --version
): 7.13.2Plugins installed: []
JVM version (
java -version
): "16" 2021-03-16OS version (
uname -a
if on a Unix-like system): Linux 7cf7d004f550 5.11.0-22-generic #23-Ubuntu SMP Thu Jun 17 00:34:23 UTC 2021 x86_64 x86_64 x86_64 GNU/LinuxDescription of the problem including expected versus actual behavior:
Fuzziness breaks boolean similarity scoring. Without fuzziness a document scores 1 if there's a perfect match of 1 term. With fuzziness enabled a perfect match of 1 term scores 1 again and a fuzzy match scores below 1 which is as expected. However, when there are 2 terms in the document and they both match after a fuzzy query is expanded the score is summed and thus the document is ranked above a document with 1 term and a perfect match. The perfect match (no typo corrected by fuzziness) should always rank higher. The score of the boolean similarity should be the best score for one of the rewritten terms, not the sum of all scores for all rewritten terms. In the example below the perfect match
euston
should score aboveboston selston
when querying foreuston
Steps to reproduce:
The text was updated successfully, but these errors were encountered: