Suggester: Phrase suggest option to limit suggestions to exising phrases #3482

radev · 2013-08-12T10:08:05Z

When using phrase suggest API to provide "Did you mean ?" corrections it would be nice to include only suggestions that would return results.

So returned phrase must exist at least in one document in the index.

s1monw · 2013-08-12T11:02:24Z

Thanks for opening this. do you feel like attaching a pullrequest? I'd be happy to help you sketching out the functionality here!

nik9000 · 2013-08-12T13:04:22Z

So I had a similar problem that came from filters: I was getting phrases that exist in the index but were filtered out on subsequent searches with the same filter set. I've since mostly worked around the problem by splitting my index along the most common filter. So if it isn't that much more work to get it to include filters that'd be great. If OTOH, you do something like only return suggestions that match an n-gram then by all means just do that and ignore me. I'd still use it and it'd probably be faster in the end.

s1monw · 2013-08-12T13:23:26Z

folks, I don't think we can filter the process of drawing candidates etc. since performance will suffer badly. What I can imagine is to execute a match query with each suggestoin that is returned to make a decision to drop them or not. Yet, this will allow for filtering for sure but it will only be a helper to prune the result list. if this is ok for you guys I think we can certainly do that!

s1monw · 2013-08-29T12:20:33Z

@nik9000 do you wanna take a look at this. I won't be able to do in the near future.

nik9000 · 2013-08-29T13:40:40Z

I can take a look sometime in the next few days, yeah. We just deployed our elasticsearch software to a much larger group of users yesterday so I'm getting a bunch of high priority bugs that aren't (yet) this. What I'll do is file this issue as a medium priority one on my side and pick it up when I've cleared everything higher.

So what'd be most useful for me would be to have an api like this:

curl -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "field" : "body",
        "size" : 5,
        "shard_size": 10,
        "confidence": 2.0,
        "filter_replace_string": "{}",
        "filter": {
          "bool" : {
                "must" : {
                    "query": { "match_phrase" : { "body" : "{}", "slop": 3 } }
                },
                "must_not" : {
                    "range" : {
                        "age" : { "from" : 10, "to" : 20 }
                    }
                },
                "should" : [
                    {
                        "term" : { "tag" : "sometag" }
                    },
                    {
                        "term" : { "tag" : "sometagtag" }
                    }
                ]
            }
        }
      }
    }
  }
}

I can see how this could be a performance problem but caching should kick in and help with most of the filters. Also, if you set your confidence nice and high you might not have to do this too many time.

And another thing, you'd have to crank up the shard_size value or you might end up filtering out all the suggestions that come from the shards.

nik9000 · 2013-09-06T12:19:26Z

Now that I'm digging into this I like this API better

curl -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "field" : "body",
        "size" : 5,
        "shard_size": 10,
        "confidence": 2.0,
        "filter": {
          "bool" : {
                "must_not" : {
                    "range" : {
                        "age" : { "from" : 10, "to" : 20 }
                    }
                },
                "should" : [
                    {
                        "term" : { "tag" : "sometag" }
                    },
                    {
                        "term" : { "tag" : "sometagtag" }
                    }
                ]
            }
        }
      }
    }
  }
}```
Internally I'd build a bool filter containing a phrase_match against the field against which we generate suggestions and the filter you passes in.  This is less fiddly to code and allows some simple syntax shortcuts:
```bash
curl -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "field" : "body",
        "size" : 5,
        "shard_size": 10,
        "confidence": 2.0,
        "filter": "yes"
      }
    }
  }
}```
which would add the phrase_match and
```bash
curl -XPOST 'localhost:9200/twitter/_search?pretty=true' -d '
{
    "query" : {
        "term" : { "message" : "something" }
    },
    "filter" : {
        "term" : { "tag" : "green" }
    }
    "suggest" : {
        "text" : "Xor the Got-Jewel",
        "simple_phrase" : {
            "phrase" : {
                "field" : "body",
                "size" : 5,
                "shard_size": 10,
                "confidence": 2.0,
                "filter": "query"
            }
        }
    }
}'

which would go and get the filters from the top level query.

I'm still concerned about how slow this might be.

nik9000 · 2013-09-14T19:08:20Z

So I finally have something for this that kinda works. It doesn't fully work and I'd like some guidance. I can't post the code right now because I'm traveling. Such is life. Any way, I'd like giluidance on two things:

Since there is no good way to parse the request during the reduce phase I'm parsing what I need on the shards and sending it back. Is that right/normal?
I can make queries during the reduce phase and block, waiting for them to return pretty easy. That seems to work on when I start the server but badly in tests, presumably because they have a lower bound on tr thread pool. Is it OK to make blocking calls in the reduce step? Anything else seems like a pretty invasive change for this feature which doesn't have a ton of traction.

This implementation has a bunch of problems that'll need to be worked before it is a valid candidate for merging. I don't have time to rebase it right now but would still love the feedback on problem. The ones I remember: 1. It performs the filtering by blocking the suggesting thread. 2. Because there is no "exists" query type it uses a limit. I now know that isn't ass efficient as just using a count but it might be worth implementing an exists query type for it any way. 3. It feels like there are a lot of plumbing changes required for this feature. My guess is that is because I'm going about it wrong. This correlates with #1 pretty well. 4. I have to wrap the filter through the map nodes and parse it during the reduce step. That feels silly. Closes elastic#3482

s1monw · 2013-11-13T11:25:35Z

moved over to 0.90.8 I will look at this soon as well

timbunce · 2014-04-25T11:20:55Z

Any news on this, or the similar #2842?

s1monw · 2014-04-25T15:39:01Z

I think now that we have templates we can implement this much simpler - I think we should revisit it soon, thanks for pinging @timbunce

brupm · 2014-06-20T02:03:43Z

👍

brupm · 2014-06-27T01:36:54Z

This would be awesome to have. I ended up having to do it outside ES.

Get the suggestions
Perform another ES search to validate the existence
Present only existing phases to users

clintongormley · 2014-07-01T13:37:09Z

@areek something you could take a look at?

areek · 2014-07-01T19:17:20Z

@clintongormley already started looking into it!

areek · 2014-07-07T18:39:43Z

After looking into it, I have come up with the following API for the suggestion filter option:

curl -XPOST 'localhost:9200/_search' -d {
    "suggest": {
        "text": "Xor the Got-Jewel",
        "simple_phrase": {
            "phrase": {
                "field": "body",
                "size": 5,
                "shard_size": 10,
                "confidence": 2,
                "filter": {
                    "match": {
                        "body": "{{suggestion}}"
                    }
                }
            }
        }
    }
}

The filter option above is just a query template with the magic variable "suggestion", which will be populated once phrase suggestions are made.
For the first iteration, I intend to make local queries for the filter instead of hitting all the shards and also restrict the size of the suggestions to a maximum of 20, when filter option is used.

…ching any documents for a given query The newly added filter option will let the user provide a template query which will be executed for every phrase suggestions generated to ensure that the suggestion matches at least one document for the query. The filter query is only executed on the local node for now. When the new filter option is used, the size of the suggestion is restricted to 20. Closes elastic#3482

clintongormley · 2014-07-08T08:06:59Z

@areek Just a note: match is a query, not a filter.

areek · 2014-07-08T14:26:12Z

@clintongormley Currently the filter param takes queries! The reason the name is filter, is to indicate that the query is used to filter out the suggestions after being generated

nik9000 · 2014-07-08T14:44:11Z

Its confusing to take queries and call it filter, I think. Why not make it
take filters?

On Tue, Jul 8, 2014 at 10:26 AM, Areek Zillur notifications@github.com
wrote:

@clintongormley https://github.com/clintongormley Currently the filter
param takes queries! The reason the name is filter, is to indicate that the
query is used to filter out the suggestions after being generated

—
Reply to this email directly or view it on GitHub
#3482 (comment)
.

clintongormley · 2014-07-08T20:32:25Z

Agree with @nik9000 about confusing. filter means filters, not queries. In that case should be called query to be consistent with other APIs.

areek · 2014-07-08T23:44:07Z

It does seem confusing, the updated API looks like the following:

curl -XPOST 'localhost:9200/_search' -d {
    "suggest": {
        "text": "Xor the Got-Jewel",
        "simple_phrase": {
            "phrase": {
                "field": "body",
                "size": 5,
                "shard_size": 10,
                "confidence": 2,
                "filter": {
                    "template": {
                        "body": "{{suggestion}}"
                    },
                    "preference": "_only_local"
                }
            }
        }
    }
}

The change was mainly due to adding the preference param in the filter object. Calling it query would also confuse users at least with respect to the reason for providing such an option in the suggest API. Thoughts?

clintongormley · 2014-07-09T09:19:43Z

@areek I assume that any query can be run there? it's not limited to just template queries? Also, if the user wants to use a template query with other parameters, I assume they can just specify params and the suggestion value will be added to any manually specified params?

Either way, filter is the wrong name for that section. Wherever we have filter or *_filter, it means that we are in "filter context" and only filter clauses can be accepted.

I'd be perfectly happy just renaming it to query, but if you want something slightly more descriptive then why not use post_query? We use post_filter in the top level of search requests to say: "this filter will be applied to the hits array after the query has been run and after the aggregations have been calculated". This functionality seems similar.

s1monw · 2014-07-09T19:17:24Z

what about:

"collate" : {
  "filter|query" : { ... }   
  "preference": "_only_local"
}

then folks can pick if we should parse it as a query or not. The template engine doesn't care really it's just string replacements.... we can document it that it is passed through mustache.

areek · 2014-07-09T21:16:54Z

@s1monw I like collate, I will go ahead and change it to that format and document the use of mustache under the covers.

@clintongormley all your assumptions are correct, except that the params support is not there, the reason for the template is to allow for the magic suggestion variable.

clintongormley · 2014-07-10T11:39:29Z

@areek i'm thinking that users might want to use a template query here. Couldn't you just check if there are params already defined, and merge suggestion variable into them?

areek · 2014-07-10T17:32:01Z

@clintongormley I will end up doing that, thanks for suggesting.

areek · 2014-07-10T22:42:31Z

So the updated API looks like the following:

curl -XPOST 'localhost:9200/_search' -d {
    "suggest": {
        "text": "Xor the Got-Jewel",
        "simple_phrase": {
            "phrase": {
                "field": "body",
                "size": 5,
                "shard_size": 10,
                "confidence": 2,
                "collate": {
                    "query": {
                        "{{field_name}}": "{{suggestion}}"
                    },
                    "preference": "_primary",
                    "params": {"field_name": "title"}
                }
            }
        }
    }
}

collate can also take a filter instead of query (as suggested by @s1monw) and optional params can be used, to inject the template query or filter with additional params (as suggested by @clintongormley).

The newly added collate option will let the user provide a template query/filter which will be executed for every phrase suggestions generated to ensure that the suggestion matches at least one document for the filter/query. The user can also add routing preference `preference` to route the collate query/filter and additional `params` to inject into the collate template. Closes #3482

ghost assigned s1monw Aug 12, 2013

nik9000 mentioned this issue Oct 25, 2013

Add filter option to term suggester #2842

Closed

nik9000 mentioned this issue Nov 4, 2013

Post filter phrase suggestions #4072

Closed

s1monw added v1.2.0 and removed v1.1.0 labels Mar 20, 2014

s1monw added v1.3.0 and removed v1.2.0 labels May 12, 2014

s1monw assigned areek and unassigned s1monw Jun 27, 2014

areek mentioned this issue Jul 7, 2014

Phrase Suggester: Add option to filter out phrase suggestions #6773

Closed

s1monw added v1.4.0 and removed v1.3.0 labels Jul 14, 2014

areek closed this as completed in 7634389 Jul 14, 2014

s1monw added v1.3.0 and removed v1.4.0 labels Jul 14, 2014

areek mentioned this issue Jul 15, 2014

Add option to Count API to terminate early if a specified count is reached #6876

Closed

clintongormley added the release highlight label Jul 16, 2014

clintongormley changed the title ~~Phrase suggest option to limit suggestions to exising phrases~~ Suggester: Phrase suggest option to limit suggestions to exising phrases Jul 16, 2014

areek mentioned this issue Jul 18, 2014

PhraseSuggester: Collate option should allow returning phrases with no matching docs #6927

Closed

clintongormley added feature and removed enhancement labels Jul 21, 2014

gmarz mentioned this issue Jul 22, 2014

Phrase suggester collate support elastic/elasticsearch-net#824

Closed

clintongormley added the :Search Relevance/Suggesters "Did you mean" and suggestions as you type label Jun 6, 2015

javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024

ChrisHegarty unassigned areek Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggester: Phrase suggest option to limit suggestions to exising phrases #3482

Suggester: Phrase suggest option to limit suggestions to exising phrases #3482

radev commented Aug 12, 2013

s1monw commented Aug 12, 2013

nik9000 commented Aug 12, 2013

s1monw commented Aug 12, 2013

s1monw commented Aug 29, 2013

nik9000 commented Aug 29, 2013

nik9000 commented Sep 6, 2013

nik9000 commented Sep 14, 2013

s1monw commented Nov 13, 2013

timbunce commented Apr 25, 2014

s1monw commented Apr 25, 2014

brupm commented Jun 20, 2014

brupm commented Jun 27, 2014

clintongormley commented Jul 1, 2014

areek commented Jul 1, 2014

areek commented Jul 7, 2014

clintongormley commented Jul 8, 2014

areek commented Jul 8, 2014

nik9000 commented Jul 8, 2014

clintongormley commented Jul 8, 2014

areek commented Jul 8, 2014

clintongormley commented Jul 9, 2014

s1monw commented Jul 9, 2014

areek commented Jul 9, 2014

clintongormley commented Jul 10, 2014

areek commented Jul 10, 2014

areek commented Jul 10, 2014

Suggester: Phrase suggest option to limit suggestions to exising phrases #3482

Suggester: Phrase suggest option to limit suggestions to exising phrases #3482

Comments

radev commented Aug 12, 2013

s1monw commented Aug 12, 2013

nik9000 commented Aug 12, 2013

s1monw commented Aug 12, 2013

s1monw commented Aug 29, 2013

nik9000 commented Aug 29, 2013

nik9000 commented Sep 6, 2013

nik9000 commented Sep 14, 2013

s1monw commented Nov 13, 2013

timbunce commented Apr 25, 2014

s1monw commented Apr 25, 2014

brupm commented Jun 20, 2014

brupm commented Jun 27, 2014

clintongormley commented Jul 1, 2014

areek commented Jul 1, 2014

areek commented Jul 7, 2014

clintongormley commented Jul 8, 2014

areek commented Jul 8, 2014

nik9000 commented Jul 8, 2014

clintongormley commented Jul 8, 2014

areek commented Jul 8, 2014

clintongormley commented Jul 9, 2014

s1monw commented Jul 9, 2014

areek commented Jul 9, 2014

clintongormley commented Jul 10, 2014

areek commented Jul 10, 2014

areek commented Jul 10, 2014