Skip to content

Commit

Permalink
fix(specs): clarify decompounding limitations (#3227)
Browse files Browse the repository at this point in the history
  • Loading branch information
kai687 authored Jul 30, 2024
1 parent 3f622e5 commit a35e814
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion specs/common/schemas/IndexSettings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@ baseIndexSettings:
You can specify different lists for different languages.
Decompounding is supported for these languages:
Dutch (`nl`), German (`de`), Finnish (`fi`), Danish (`da`), Swedish (`sv`), and Norwegian (`no`).
Decompounding doesn't work for words with [non-spacing mark Unicode characters](https://www.charactercodes.net/category/non-spacing_mark).
For example, `Gartenstühle` won't be decompounded if the `ü` consists of `u` (U+0075) and `◌̈` (U+0308).
default: {}
x-categories:
- Languages
Expand Down Expand Up @@ -527,10 +529,12 @@ indexSettingsAsSearchParams:
decompoundQuery:
type: boolean
description: |
Whether to split compound words into their building blocks.
Whether to split compound words in the query into their building blocks.
For more information, see [Word segmentation](https://www.algolia.com/doc/guides/managing-results/optimize-search-results/handling-natural-languages-nlp/in-depth/language-specific-configurations/#splitting-compound-words).
Word segmentation is supported for these languages: German, Dutch, Finnish, Swedish, and Norwegian.
Decompounding doesn't work for words with [non-spacing mark Unicode characters](https://www.charactercodes.net/category/non-spacing_mark).
For example, `Gartenstühle` won't be decompounded if the `ü` consists of `u` (U+0075) and `◌̈` (U+0308).
default: true
x-categories:
- Languages
Expand Down

0 comments on commit a35e814

Please sign in to comment.