Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reloading of search time analyzers #43313

Merged
merged 9 commits into from
Jun 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -730,8 +730,8 @@ public void testApiNamingConventions() throws Exception {
"indices.exists_type",
"indices.get_upgrade",
"indices.put_alias",
"scripts_painless_execute",
"render_search_template"
"render_search_template",
"scripts_painless_execute"
};
//These API are not required for high-level client feature completeness
String[] notRequiredApi = new String[] {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Additional settings are:
* `expand` (defaults to `true`).
* `lenient` (defaults to `false`). If `true` ignores exceptions while parsing the synonym configuration. It is important
to note that only those synonym rules which cannot get parsed are ignored. For instance consider the following request:



[source,js]
--------------------------------------------------
Expand Down
72 changes: 72 additions & 0 deletions docs/reference/indices/apis/reload-analyzers.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
[role="xpack"]
[testenv="basic"]
[[indices-reload-analyzers]]
== Reload Search Analyzers

experimental[]

Reloads search analyzers and its resources.

Synonym filters (both `synonym` and `synonym_graph`) can be declared as
updateable if they are only used in <<search-analyzer,search analyzers>>
with the `updateable` flag:

[source,js]
--------------------------------------------------
PUT /my_index
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"my_synonyms" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt",
"updateable" : true <1>
}
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer" : "standard",
"search_analyzer": "my_synonyms" <2>
}
}
}
}
--------------------------------------------------
// CONSOLE

<1> Mark the synonym filter as updateable.
<2> Synonym analyzer is usable as a search_analyzer.

NOTE: Trying to use the above analyzer as an index analyzer will result in an error.

Using the <<indices-reload-analyzers,analyzer reload API>>, you can trigger reloading of the
synonym definition. The contents of the configured synonyms file will be reloaded and the
synonyms definition the filter uses will be updated.

The `_reload_search_analyzers` API can be run on one or more indices and will trigger
reloading of the synonyms from the configured file.

NOTE: Reloading will happen on every node the index has shards, so its important
to update the synonym file contents on every data node (even the ones that don't currently
hold shard copies; shards might be relocated there in the future) before calling
reload to ensure the new state of the file is reflected everywhere in the cluster.

[source,js]
--------------------------------------------------
POST /my_index/_reload_search_analyzers
--------------------------------------------------
// CONSOLE
// TEST[s/^/PUT my_index\n/]
2 changes: 2 additions & 0 deletions docs/reference/rest-api/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ directly to configure and access {xpack} features.
* <<data-frame-apis,{dataframe-cap} APIs>>
* <<graph-explore-api,Graph Explore API>>
* <<freeze-index-api>>, <<unfreeze-index-api>>
* <<indices-reload-analyzers,Reload Search Analyzers API>>
* <<index-lifecycle-management-api,Index lifecycle management APIs>>
* <<licensing-apis,Licensing APIs>>
* <<ml-apis,Machine Learning APIs>>
Expand All @@ -35,4 +36,5 @@ include::{es-repo-dir}/rollup/rollup-api.asciidoc[]
include::{xes-repo-dir}/rest-api/security.asciidoc[]
include::{es-repo-dir}/indices/apis/unfreeze.asciidoc[]
include::{xes-repo-dir}/rest-api/watcher.asciidoc[]
include::{es-repo-dir}/indices/apis/reload-analyzers.asciidoc[]
include::defs.asciidoc[]
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.analysis.AbstractTokenFilterFactory;
import org.elasticsearch.index.analysis.Analysis;
import org.elasticsearch.index.analysis.AnalysisMode;
import org.elasticsearch.index.analysis.CharFilterFactory;
import org.elasticsearch.index.analysis.CustomAnalyzer;
import org.elasticsearch.index.analysis.TokenFilterFactory;
Expand All @@ -50,6 +51,7 @@ public class SynonymTokenFilterFactory extends AbstractTokenFilterFactory {
private final boolean lenient;
protected final Settings settings;
protected final Environment environment;
private final boolean updateable;

SynonymTokenFilterFactory(IndexSettings indexSettings, Environment env,
String name, Settings settings) {
Expand All @@ -65,9 +67,15 @@ public class SynonymTokenFilterFactory extends AbstractTokenFilterFactory {
this.expand = settings.getAsBoolean("expand", true);
this.lenient = settings.getAsBoolean("lenient", false);
this.format = settings.get("format", "");
this.updateable = settings.getAsBoolean("updateable", false);
this.environment = env;
}

@Override
public AnalysisMode getAnalysisMode() {
return this.updateable ? AnalysisMode.SEARCH_TIME : AnalysisMode.ALL;
}

@Override
public TokenStream create(TokenStream tokenStream) {
throw new IllegalStateException("Call createPerAnalyzerSynonymFactory to specialize this factory for an analysis chain first");
Expand Down Expand Up @@ -98,6 +106,11 @@ public TokenFilterFactory getSynonymFilter() {
// which doesn't support stacked input tokens
return IDENTITY_FILTER;
}

@Override
public AnalysisMode getAnalysisMode() {
return updateable ? AnalysisMode.SEARCH_TIME : AnalysisMode.ALL;
}
};
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@
import org.elasticsearch.index.IndexService;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.analysis.AnalysisRegistry;
import org.elasticsearch.index.analysis.AnalyzerComponents;
import org.elasticsearch.index.analysis.AnalyzerComponentsProvider;
import org.elasticsearch.index.analysis.CharFilterFactory;
import org.elasticsearch.index.analysis.CustomAnalyzer;
import org.elasticsearch.index.analysis.NameOrDefinition;
import org.elasticsearch.index.analysis.NamedAnalyzer;
import org.elasticsearch.index.analysis.TokenFilterFactory;
Expand Down Expand Up @@ -261,18 +262,23 @@ private static AnalyzeAction.DetailAnalyzeResponse detailAnalyze(AnalyzeAction.R
}
}

CustomAnalyzer customAnalyzer = null;
if (analyzer instanceof CustomAnalyzer) {
customAnalyzer = (CustomAnalyzer) analyzer;
} else if (analyzer instanceof NamedAnalyzer && ((NamedAnalyzer) analyzer).analyzer() instanceof CustomAnalyzer) {
customAnalyzer = (CustomAnalyzer) ((NamedAnalyzer) analyzer).analyzer();
// maybe unwrap analyzer from NamedAnalyzer
Analyzer potentialCustomAnalyzer = analyzer;
if (analyzer instanceof NamedAnalyzer) {
potentialCustomAnalyzer = ((NamedAnalyzer) analyzer).analyzer();
}

if (customAnalyzer != null) {
// customAnalyzer = divide charfilter, tokenizer tokenfilters
CharFilterFactory[] charFilterFactories = customAnalyzer.charFilters();
TokenizerFactory tokenizerFactory = customAnalyzer.tokenizerFactory();
TokenFilterFactory[] tokenFilterFactories = customAnalyzer.tokenFilters();
if (potentialCustomAnalyzer instanceof AnalyzerComponentsProvider) {
AnalyzerComponentsProvider customAnalyzer = (AnalyzerComponentsProvider) potentialCustomAnalyzer;
// note: this is not field-name dependent in our cases so we can leave out the argument
int positionIncrementGap = potentialCustomAnalyzer.getPositionIncrementGap("");
int offsetGap = potentialCustomAnalyzer.getOffsetGap("");
AnalyzerComponents components = customAnalyzer.getComponents();
// divide charfilter, tokenizer tokenfilters
CharFilterFactory[] charFilterFactories = components.getCharFilters();
TokenizerFactory tokenizerFactory = components.getTokenizerFactory();
TokenFilterFactory[] tokenFilterFactories = components.getTokenFilters();
String tokenizerName = components.getTokenizerName();

String[][] charFiltersTexts = new String[charFilterFactories != null ? charFilterFactories.length : 0][request.text().length];
TokenListCreator[] tokenFiltersTokenListCreator = new TokenListCreator[tokenFilterFactories != null ?
Expand All @@ -298,7 +304,7 @@ private static AnalyzeAction.DetailAnalyzeResponse detailAnalyze(AnalyzeAction.R
// analyzing only tokenizer
Tokenizer tokenizer = tokenizerFactory.create();
tokenizer.setReader(reader);
tokenizerTokenListCreator.analyze(tokenizer, customAnalyzer, includeAttributes);
tokenizerTokenListCreator.analyze(tokenizer, includeAttributes, positionIncrementGap, offsetGap);

// analyzing each tokenfilter
if (tokenFilterFactories != null) {
Expand All @@ -308,7 +314,7 @@ private static AnalyzeAction.DetailAnalyzeResponse detailAnalyze(AnalyzeAction.R
}
TokenStream stream = createStackedTokenStream(request.text()[textIndex],
charFilterFactories, tokenizerFactory, tokenFilterFactories, tokenFilterIndex + 1);
tokenFiltersTokenListCreator[tokenFilterIndex].analyze(stream, customAnalyzer, includeAttributes);
tokenFiltersTokenListCreator[tokenFilterIndex].analyze(stream, includeAttributes, positionIncrementGap, offsetGap);
}
}
}
Expand All @@ -331,8 +337,8 @@ private static AnalyzeAction.DetailAnalyzeResponse detailAnalyze(AnalyzeAction.R
tokenFilterFactories[tokenFilterIndex].name(), tokenFiltersTokenListCreator[tokenFilterIndex].getArrayTokens());
}
}
detailResponse = new AnalyzeAction.DetailAnalyzeResponse(charFilteredLists, new AnalyzeAction.AnalyzeTokenList(
customAnalyzer.getTokenizerName(), tokenizerTokenListCreator.getArrayTokens()), tokenFilterLists);
detailResponse = new AnalyzeAction.DetailAnalyzeResponse(charFilteredLists,
new AnalyzeAction.AnalyzeTokenList(tokenizerName, tokenizerTokenListCreator.getArrayTokens()), tokenFilterLists);
} else {
String name;
if (analyzer instanceof NamedAnalyzer) {
Expand All @@ -343,8 +349,8 @@ private static AnalyzeAction.DetailAnalyzeResponse detailAnalyze(AnalyzeAction.R

TokenListCreator tokenListCreator = new TokenListCreator(maxTokenCount);
for (String text : request.text()) {
tokenListCreator.analyze(analyzer.tokenStream("", text), analyzer,
includeAttributes);
tokenListCreator.analyze(analyzer.tokenStream("", text), includeAttributes, analyzer.getPositionIncrementGap(""),
analyzer.getOffsetGap(""));
}
detailResponse
= new AnalyzeAction.DetailAnalyzeResponse(new AnalyzeAction.AnalyzeTokenList(name, tokenListCreator.getArrayTokens()));
Expand Down Expand Up @@ -414,7 +420,7 @@ private static class TokenListCreator {
tc = new TokenCounter(maxTokenCount);
}

private void analyze(TokenStream stream, Analyzer analyzer, Set<String> includeAttributes) {
private void analyze(TokenStream stream, Set<String> includeAttributes, int positionIncrementGap, int offsetGap) {
try {
stream.reset();
CharTermAttribute term = stream.addAttribute(CharTermAttribute.class);
Expand All @@ -437,8 +443,8 @@ private void analyze(TokenStream stream, Analyzer analyzer, Set<String> includeA
lastOffset += offset.endOffset();
lastPosition += posIncr.getPositionIncrement();

lastPosition += analyzer.getPositionIncrementGap("");
lastOffset += analyzer.getOffsetGap("");
lastPosition += positionIncrementGap;
lastOffset += offsetGap;

} catch (IOException e) {
throw new ElasticsearchException("failed to analyze", e);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -738,4 +738,5 @@ public interface IndicesAdminClient extends ElasticsearchClient {
* Swaps the index pointed to by an alias given all provided conditions are satisfied
*/
void rolloverIndex(RolloverRequest request, ActionListener<RolloverResponse> listener);

}
Original file line number Diff line number Diff line change
Expand Up @@ -523,5 +523,4 @@ public static DeleteSnapshotRequest deleteSnapshotRequest(String repository, Str
public static SnapshotsStatusRequest snapshotsStatusRequest(String repository) {
return new SnapshotsStatusRequest(repository);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -519,7 +519,6 @@ public IndexAnalyzers build(IndexSettings indexSettings,
Map<String, TokenizerFactory> tokenizerFactoryFactories,
Map<String, CharFilterFactory> charFilterFactoryFactories,
Map<String, TokenFilterFactory> tokenFilterFactoryFactories) {

Map<String, NamedAnalyzer> analyzers = new HashMap<>();
Map<String, NamedAnalyzer> normalizers = new HashMap<>();
Map<String, NamedAnalyzer> whitespaceNormalizers = new HashMap<>();
Expand Down Expand Up @@ -561,9 +560,11 @@ public IndexAnalyzers build(IndexSettings indexSettings,
return new IndexAnalyzers(analyzers, normalizers, whitespaceNormalizers);
}

private static NamedAnalyzer produceAnalyzer(String name, AnalyzerProvider<?> analyzerFactory,
Map<String, TokenFilterFactory> tokenFilters, Map<String, CharFilterFactory> charFilters,
Map<String, TokenizerFactory> tokenizers) {
private static NamedAnalyzer produceAnalyzer(String name,
AnalyzerProvider<?> analyzerFactory,
Map<String, TokenFilterFactory> tokenFilters,
Map<String, CharFilterFactory> charFilters,
Map<String, TokenizerFactory> tokenizers) {
/*
* Lucene defaults positionIncrementGap to 0 in all analyzers but
* Elasticsearch defaults them to 0 only before version 2.0
Expand Down
Loading