Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzziness works incorrectly when using boolean similarity #75652

Open
yassenb opened this issue Jul 23, 2021 · 3 comments
Open

Fuzziness works incorrectly when using boolean similarity #75652

yassenb opened this issue Jul 23, 2021 · 3 comments
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@yassenb
Copy link

yassenb commented Jul 23, 2021

Elasticsearch version (bin/elasticsearch --version): 7.13.2

Plugins installed: []

JVM version (java -version): "16" 2021-03-16

OS version (uname -a if on a Unix-like system): Linux 7cf7d004f550 5.11.0-22-generic #23-Ubuntu SMP Thu Jun 17 00:34:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Fuzziness breaks boolean similarity scoring. Without fuzziness a document scores 1 if there's a perfect match of 1 term. With fuzziness enabled a perfect match of 1 term scores 1 again and a fuzzy match scores below 1 which is as expected. However, when there are 2 terms in the document and they both match after a fuzzy query is expanded the score is summed and thus the document is ranked above a document with 1 term and a perfect match. The perfect match (no typo corrected by fuzziness) should always rank higher. The score of the boolean similarity should be the best score for one of the rewritten terms, not the sum of all scores for all rewritten terms. In the example below the perfect match euston should score above boston selston when querying for euston

Steps to reproduce:

PUT /locations
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "similarity": "boolean",
        "norms": false,
        "index_options": "docs"
      }
    }
  }
}

POST /locations/_doc
{
  "name": "euston"
}

POST /locations/_doc
{
  "name": "boston selston"
}

GET /locations/_search
{
  "query": {
    "match": {
      "name": {
        "query": "euston",
        "operator": "and", 
        "fuzziness": 2,
        "max_expansions": 10000
      }
    }
  }
}
@yassenb yassenb added >bug needs:triage Requires assignment of a team area label labels Jul 23, 2021
@DJRickyB DJRickyB added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels Jul 26, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@dnhatn dnhatn self-assigned this Jul 26, 2021
@dnhatn dnhatn removed their assignment May 1, 2024
@benwtrent benwtrent added the priority:normal A label for assessing bug priority to be used by ES engineers label Jul 10, 2024
@benwtrent
Copy link
Member

My thought on this issue is, I am not sure that the single exact match should score higher than the fuzzy match on two terms.

The relevancy described here is opinion and I think the current behavior is appropriate. Especially since one can boost an exact match by combining with a boolean query that IS an exact match (a should clause to boost significantly on an exact match).

@javanna javanna added :Search Relevance/Search Catch all for Search Relevance and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine removed the Team:Search Meta label for search team label Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

7 participants