-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor autocomplete analysis #109
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Travis reports the integration tests failing, this is an intermittent issue which depends on hardware, I tried to solve this in missinglink/elastictest#1 but it seems to still be an issue. all tests pass when run locally. |
This was referenced Mar 28, 2016
note: these analysers should be updated with the new tokenizers in #113 once it's merged the integration tests will also need to be updated, see |
missinglink
added a commit
to pelias/api
that referenced
this pull request
Mar 29, 2016
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR refactors the analyzers used by the
/v1/autocomplete
endpoint, with the goals of:/v1/search
endpoint making subsequent refactoring easier.Currently we use 3 different analyzers in the
/v1/autocomplete
endpoint:The
peliasPhrase
analyzer was originally intended to be used with/v1/search
and you can see above that the way it handles synonyms is mismatched with the way the other 2 analyzers handle the wordcenter
(for example). this is the cause of pelias/pelias#211new analyzers:
The new analyzers proposed in this PR are:
They produce the same tokens when given the abbreviated/contracted form "ctr":
directionals:
They also handle directional synonyms in a similar way:
Again, they produce the same tokens when given the abbreviated/contracted form "n":
note: there is a bit of a 'hack' in place for the above
peliasIndexTwoEdgeGram
analysis that is specific to directionals, you can see it adds a single gram 'n' in to a token stream which usually only contains grams of size 2+. This improves address matching and reduces 'jitter'.api/query changes:
All usages of existing analyzers in
/v1/autocomplete
must be updated:peliasOneEdgeGram
->peliasQueryPartialToken
peliasPhrase
->peliasQueryFullToken
Additionally the autocomplete queries should no longer need to use the
phrase.*
index, all queries can safely be performed against thename.*
index (if not already doing so).note: we can discuss removing the
phrase.*
index completely! this would greatly reduce the cluster disk/ram usage, it might be possible to achieve all the functionality of/v1/search
using the prefixGram index. let's discuss this in another issue.dataset importer changes:
nil
risks / expected acceptance test changes:
There is not much that can go wrong here, the only differences at index time are that:
peliasIndexOneEdgeGram
expands directionals whereaspeliasOneEdgeGram
does not.peliasIndexTwoEdgeGram
is the same and includes the 'hack' mentioned above.The differences at query time are:
I've left some other changes I would like to make for a future PR in order to reduce the amount of changes going in at the same time.
related:
closes #105
resolves pelias/pelias#211
related pelias/openaddresses#68