The plan is for this to include any extra queries, filters, native scripts, score functions, and anything else we think we end up creating to make search nice for Wikimedia. It contains four diffferent plugins:
The extra plugin contains utilities that are generally useful.
Queries:
- source_regex - An nGram accelerated regular expression filter that is generally much much faster than sequentially checking all documents.
- token_count_router - Simple query wrapper that evaluates some conditions based on the number of tokens of the input query.
- simswitcher - Simple query wrapper that allows to override similarity settings at query time (expert: use with caution).
- term_freq - Simple term query with filtering based on term frequency.
Native Scripts:
- super_detect_noop - Like
detect_noop
but supports configurable sloppiness. New in 1.5.0, 1.4.1, and 1.3.1.
Analysis:
- preserve_original - A token filter that wraps a filter chain to keep and emit the original term at the same position. New in 2.3.4.
- term_freq - A token filter to populate the term frequency from the input string. New in 5.5.2.6.
Analysis:
- homoglyph_norm - A token filter that will provide additional single-script tokens for multi-script tokens that contain homoglyphs.
Analysis:
- khmer_syll_reorder - A character filter that will replace deprecated Khmer characters and attempt to canonically reorder Khmer orthographic syllables.
This plugin contains a Slovak stemmer.
Analysis:
- slovak_stemmer - A token filter that provides stemming for the Slovak language. New in 5.5.2.4.
This plugin contains miscellaneous text mungers.
Analysis:
- acronym_fixer - A character filter that removes periods from acronym-like contexts.
- camelCase_splitter - A character filter that splits camelCase words.
- icu_token_repair - A token filter that rejoins tokens split asunder by the ICU tokenizer.
- limited_mapping - A character filter that is limited to changing or deleting single characters.
Analysis:
- better_apostrophe - A smarter version of the OpenSearch/Lucene
apostrophe
token filter for Turkish, which is much too aggressive for multilingual data. See the linked docs for more details.
These filters are provided to allow for unpacking the monolithic OpenSearch Ukrainian analyzer, which is a wrapper around the monolithic Lucene Ukrainian analyzer. This version of the Urkainian stemmer uses slightly a newer version of the Morfologik Ukrainian stemming dictionary than the parallel version in OpenSearch/Lucene.
Analysis:
-
ukrainian_stop - A stopword token filter for Ukrainian.
-
ukrainian_stemmer - A token filter than provides stemming for the Ukrainian language.
Extra Queries and Filters Plugin | ElasticSearch |
---|---|
1.3.19, master branch | OpenSearch 1.3.19 |
6.3.1.2 | Elastic 6.3.1 |
5.5.2.7 | Elastic 5.5.2 |
5.5.2 | Elastic 5.5.2 |
5.3.2 | Elastic 5.3.2 |
5.2.2 | Elastic 5.2.2 |
5.2.1 | Elastic 5.2.1 |
5.2.0 | Elastic 5.2.0 |
5.1.2 | Elastic 5.1.2 |
2.4.1, 2.4 branch | Elastic 2.4.1 |
2.4.0 | Elastic 2.4.0 |
2.3.5, 2.3 branch | Elastic 2.3.5 |
2.3.4 | Elastic 2.3.4 |
Install it like so for Elasticsearch x.y.z:
<= Elastic 2.4.1
./bin/plugin --install org.wikimedia.search/extra/x.y.z
>= Elastic 5.1.2
./bin/elasticsearch-plugin install org.wikimedia.search:extra:x.y.z
./bin/elasticsearch-plugin install org.wikimedia.search:extra-analysis-slovak:x.y.z
>= OpenSearch 1.3.19
./bin/opensearch-plugin install org.wikimedia.search:extra:x.y.z
./bin/opensearch-plugin install org.wikimedia.search:extra-analysis-slovak:x.y.z
Spotbugs is run during the verify
phase of the
build to find common issues. The build will break if any issue is found. The
issues will be reported on the console.
To run just the check, use mvn spotbugs:check
on a project that was already
compiled (mvn compile
). mvn spotbugs:gui
will provide a graphical UI that
might be easier to read.
Like all tools, spotbugs is much dumber than you. If you find a false positive,
you can ignore it with the @SuppressFBWarnings
annotation. You can provide a
justification to make document why this rule should be ignored in this specific
case. Some rules don't make sense for this project and they can be ignored via
src/dev-tools/spotbugs-excludes.xml
.