Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updateability of search-time synonyms #29051

Closed
jpountz opened this issue Mar 14, 2018 · 11 comments
Closed

Updateability of search-time synonyms #29051

jpountz opened this issue Mar 14, 2018 · 11 comments
Assignees
Labels
>enhancement high hanging fruit :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Mar 14, 2018

We need a way to update synonyms that:

  • allows to change search-time synonyms without reopening
  • makes sure to not allow updating index-time synonyms, which could essentially corrupt the index
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@byronvoorbach
Copy link
Contributor

byronvoorbach commented Mar 14, 2018

+1

Myself, I solved this by creating a custom plugin that allows for uploading new resources. Not ideal but it works. Would be great if this could be built in! (Actually for all query-time resources: stopwords, common grams, etc..)

@Tobsucht
Copy link

Tobsucht commented May 2, 2018

+1

@byronvoorbach how do you do it without reopening the index? I'm also using search-time synonyms and I don't see a possibility to programmatically reload/flush my custom analyzer within my plugin or am I wrong?

@byronvoorbach
Copy link
Contributor

@Tobsucht I have a custom implementation of the SynonymFilter, which allows me to update synonyms without closing/opening the index. :)

@Tobsucht
Copy link

Tobsucht commented May 3, 2018

@byronvoorbach I thought there might be a "cleaner" solution ... :) Did it the same way now and it works. For everybody who wants to implement this using the ResourceWatcherService, this might help: https://discuss.elastic.co/t/instance-of-resourcewatcherservice/130285

@byronvoorbach
Copy link
Contributor

@Tobsucht yeah was hoping for that too.. Sadly, I couldn't use ResourceWatcherService because I had the requirement of posting a new version of synonyms to a new rest-endpoint in ES, instead of updating the local files on all nodes. Nice that you got it to work!

@byronvoorbach
Copy link
Contributor

@jpountz Is this something planned for the near future? :)

@jpountz
Copy link
Contributor Author

jpountz commented Aug 14, 2018

It has been discussed, but not scheduled yet.

@telendt
Copy link
Contributor

telendt commented Aug 14, 2018

@jpountz: Thanks for the update.

I know you folks have your priorities, but this feature has been brought up many times before. You can clearly see that there's a real need for it - just search in Github for elasticsearch synonymmap and look how many repositories with ES plugins are there (most of them really poor quality though).

Don't want to point fingers, but "your biggest competitor" seems to support this use case via Managed Resources (unless I'm misinterpreting it).

I think it would be really cool if you could at least put it somewhere on your roadmap.

Cheers!

@cbuescher cbuescher self-assigned this Nov 26, 2018
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 4, 2019
Currently token filter settings are treated as fixed once they are declared and
used in an analyzer. This done to prevent changes in analyzers that are already
used actively to index documents, since changed to the analysis chain would
corrupt the index. However, it would be sage to allow updates to token
filters at search time ("search_analyzer"). This change introduces a new
property of token filters where they can now be declared to be "updatable". If
one tokenfilter in an analyzer is updateable in this way, it itself inherits
this property and we can reject these analyzers if they are tried to be used at
index time. This change demonstrates this with the "synonym" token filter as a
first example.

Relates to elastic#29051
cbuescher pushed a commit that referenced this issue Mar 12, 2019
Currently token filter settings are treated as fixed once they are declared and
used in an analyzer. This is done to prevent changes in analyzers that are already
used actively to index documents, since changes to the analysis chain could
corrupt the index. However, it would be safe to allow updates to token
filters at search time ("search_analyzer"). This change introduces a new
property of token filters that allows to mark them as only being usable at search
or at index time. Any analyzer that uses these tokenfilters inherits that property
and can be rejected if they are used in other contexts. This is a first step towards
making specific token filters (e.g. synonym filter) updateable.

Relates to #29051
cbuescher pushed a commit that referenced this issue Mar 12, 2019
Currently token filter settings are treated as fixed once they are declared and
used in an analyzer. This is done to prevent changes in analyzers that are already
used actively to index documents, since changes to the analysis chain could
corrupt the index. However, it would be safe to allow updates to token
filters at search time ("search_analyzer"). This change introduces a new
property of token filters that allows to mark them as only being usable at search
or at index time. Any analyzer that uses these tokenfilters inherits that property
and can be rejected if they are used in other contexts. This is a first step towards
making specific token filters (e.g. synonym filter) updateable.

Relates to #29051
@colings86 colings86 added the 7x label Apr 12, 2019
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue May 29, 2019
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to elastic#29051
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Jun 5, 2019
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to elastic#29051
cbuescher pushed a commit that referenced this issue Jun 14, 2019
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
cbuescher pushed a commit that referenced this issue Jun 27, 2019
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Jun 27, 2019
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to elastic#29051
cbuescher pushed a commit that referenced this issue Jun 28, 2019
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
@cbuescher cbuescher added v7.3.0 and removed 7x labels Jul 2, 2019
@jpountz jpountz removed the v7.3.0 label Jul 5, 2019
@cbuescher
Copy link
Member

@jpountz do you think we can close this issue with #43313 coming up in the 7.3 release?

@jpountz
Copy link
Contributor Author

jpountz commented Jul 23, 2019

+1

@jpountz jpountz closed this as completed Jul 23, 2019
@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement high hanging fruit :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

8 participants