Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Autosuggest for german Umlaute not working in Ext:solr 11 #3096

Closed
Tracked by #3155
ulrike-cosmoblonde opened this issue Nov 14, 2021 · 4 comments
Closed
Tracked by #3155
Assignees

Comments

@ulrike-cosmoblonde
Copy link

ulrike-cosmoblonde commented Nov 14, 2021

Hi,

I have been using the autosuggest feature in several projects on German content without any Umlaut issues with the solr extension v10.
When using the schema.xml from the solr extension version 10 in my solr instance, then autosuggest works for Umlaute.
But with the latest schema.xml shipped with the solr extension version 11, Umlaute are no longer working in autosuggest.
So entering "Künst" does not produce any suggest results, but entering "Kunst" does show e.g. Kunstler, kunstlich, etc.

When comparing the configuration for the field type textSpell (which is the recommended field type for autosuggest) in both schemas, the difference and hence the problem is the additional filter "solr.ASCIIFoldingFilterFactory" in the index and query analyzer.

<!-- Setup simple analysis for spell checking -->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
	<tokenizer class="solr.StandardTokenizerFactory"/>
	<filter class="solr.LowerCaseFilterFactory"/>

	<filter class="solr.DictionaryCompoundWordTokenFilterFactory"
		dictionary="german/german-common-nouns.txt"
		minWordSize="5"
		minSubwordSize="4"
		maxSubwordSize="15"
		onlyLongestMatch="false"
	/>

	<!-- no synonyms here because we do not want to add them as spell suggestion -->
	<filter class="solr.ManagedStopFilterFactory" managed="${solr.core.name}"/>
	<filter class="solr.ASCIIFoldingFilterFactory"/>

	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
	<tokenizer class="solr.StandardTokenizerFactory" />

	<filter class="solr.LowerCaseFilterFactory"/>

	<filter class="solr.ManagedSynonymGraphFilterFactory" managed="${solr.core.name}" />
	<filter class="solr.ManagedStopFilterFactory" managed="${solr.core.name}"/>
	<filter class="solr.ASCIIFoldingFilterFactory"/>

	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

When removing this filter in both analyzer and re-indexing all data, then Umlaute work again for autosuggest.

Can you please remove that filter for the textSpell type in the German schema.xml (solr/Resources/Private/Solr/configsets/ext_solr_11_1_0/conf/german/schema.xml)?

Possibly other language also need the removal of that filter definition.

Regards,
Ulrike

@dkd-kaehm
Copy link
Collaborator

dkd-kaehm commented Nov 15, 2021

@dkd-friedrich

Strange thing:
IMHO it should work, but despite <filter class="solr.ASCIIFoldingFilterFactory"/> in "textSpell" on query analyzers it doesn't.
What do you think? Maybe the order is wrong there?

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.DictionaryCompoundWordTokenFilterFactory"
dictionary="german/german-common-nouns.txt"
minWordSize="5"
minSubwordSize="4"
maxSubwordSize="15"
onlyLongestMatch="false"
/>
<!-- no synonyms here because we do not want to add them as spell suggestion -->
<filter class="solr.ManagedStopFilterFactory" managed="${solr.core.name}"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ManagedSynonymGraphFilterFactory" managed="${solr.core.name}" />
<filter class="solr.ManagedStopFilterFactory" managed="${solr.core.name}"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

@dkd-friedrich
Copy link
Member

@ulrike-cosmoblonde is right. The behaviour of the suggest query differs from a normal search query, but rather the suggestions are facet expressions.

However, I would suggest not to remove the filter completely, but to set the option preserveOriginal additionally. This option saves the original spelling and the simplified variant, e.g. "künstler" and "kunstler". This has the advantage that suggestions for a simplified spelling can also be found, especially with accents, so that suggestions are also available for "recherchée" with "recherchee".

Since this variant can lead to duplications, with "recherchée" and "recherchee" being suggested, I would suggest to include an additional spell field with the old simplified configuration, which can be used via TypoScript (plugin.tx_solr.suggest.suggestField) if required.

I'll prepare pull requests for 11.5 and 11.1, probably before the end of the week.

dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Dec 6, 2021
The introduced ASCII folding filters or language depending normalization
filters lead to issue with the auto suggest function due to the
differing stemming behaviour.

To fix this issue the original token is preserved, this e.g. allows
suggestions for search terms with and without accents. As this extension
might lead to unwanted duplicates a new field textSpellExact is
introduced, which considers non-ascii characters as given.

Resolves: TYPO3-Solr#3096
dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Dec 6, 2021
The introduced ASCII folding filters or language depending normalization
filters lead to issue with the auto suggest function due to the
differing stemming behaviour.

To fix this issue the original token is preserved, this e.g. allows
suggestions for search terms with and without accents. As this extension
might lead to unwanted duplicates a new field textSpellExact is
introduced, which considers non-ascii characters as given.

Resolves: TYPO3-Solr#3096
dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Dec 8, 2021
The introduced ASCII folding filters or language depending normalization
filters lead to issue with the auto suggest function due to the
differing stemming behaviour.

To fix this issue the original token is preserved if possible, this e.g.
allows suggestions for search terms with and without accents. As this
extension might lead to unwanted duplicates a new field textSpellExact
is introduced, which considers non-ascii characters as given.

Resolves: TYPO3-Solr#3096
dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Dec 8, 2021
The introduced ASCII folding filters or language depending normalization
filters lead to issue with the auto suggest function due to the
differing stemming behaviour.

To fix this issue the original token is preserved if possible, this e.g.
allows suggestions for search terms with and without accents. As this
extension might lead to unwanted duplicates a new field textSpellExact
is introduced, which considers non-ascii characters as given.

Resolves: TYPO3-Solr#3096
dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Dec 8, 2021
The introduced ASCII folding filters or language depending normalization
filters lead to issue with the auto suggest function due to the
differing stemming behaviour.

To fix this issue the original token is preserved if possible, this e.g.
allows suggestions for search terms with and without accents. As this
extension might lead to unwanted duplicates a new field textSpellExact
is introduced, which considers non-ascii characters as given.

Resolves: TYPO3-Solr#3096
@dkd-friedrich
Copy link
Member

@ulrike-cosmoblonde To fix this issue, I have adapted the Solr scheme, the original term is now retained. For the example here, this means that suggestions for "kunst" and "künst" are displayed.

The original less flexible variant can be used by switching to another field via plugin.tx_solr.suggest.suggestField = spellExact.

It would be helpful if you could test the adjustments as well.

dkd-kaehm pushed a commit that referenced this issue Dec 28, 2021
The introduced ASCII folding filters or language depending normalization
filters lead to issue with the auto suggest function due to the
differing stemming behaviour.

To fix this issue the original token is preserved if possible, this e.g.
allows suggestions for search terms with and without accents. As this
extension might lead to unwanted duplicates a new field textSpellExact
is introduced, which considers non-ascii characters as given.

Resolves: #3096
dkd-kaehm pushed a commit that referenced this issue Dec 28, 2021
The introduced ASCII folding filters or language depending normalization
filters lead to issue with the auto suggest function due to the
differing stemming behaviour.

To fix this issue the original token is preserved if possible, this e.g.
allows suggestions for search terms with and without accents. As this
extension might lead to unwanted duplicates a new field textSpellExact
is introduced, which considers non-ascii characters as given.

Resolves: #3096
@dkd-friedrich
Copy link
Member

I'm closing this issue, bugfix will be part of upcoming versions 11.2 and 11.5.

dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Jan 13, 2022
We are happy to release EXT:solr 11.2.0.
The focus of this release has been on supporting the latest Apache Solr
version 8.11.1 and on optimizing the data update monitoring.

- [TASK] Upgrade to Apache Solr 8.11.1 (TYPO3-Solr#3155)
- [FEATURE] Improve data update handling (TYPO3-Solr#3153)
- [BUGFIX] Fix thrown exception in Synonym and StopWordParser
- [TASK] Configure CI matrix for release 11.2
- [BUGFIX:BP:11.1] Fix autosuggest with non-ascii terms (TYPO3-Solr#3096)
- [BUGFIX] Prevent unwanted filter parameters from being generated
(TYPO3-Solr#3126)
- [TASK] Add Czech translation (TYPO3-Solr#3132)
- [TASK] Replace mirrors for Apache Solr binaries on install-solr.sh
(TYPO3-Solr#3094)
- [BUGFIX:BP:11-1] routeenhancer with empty filters (TYPO3-Solr#3099)
- [TASK] Use Environment::getContext() instead of GeneralUtility
- [BUGFIX] Don't use jQuery.ajaxSetup() (TYPO3-Solr#2503)
- [TASK] Setup Github Actions :: Basics
- [TASK] Setup Dependabot to watch "solarium/solarium" (#)
- [BUGFIX] Filter within route enhancers (TYPO3-Solr#3054)
- [BUGFIX] Fix NON-Composer mod libs composer.json for composer v2
(TYPO3-Solr#3053)
dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Feb 2, 2022
We are happy to release EXT:solr 11.2.0.
The focus of this release has been on supporting the latest Apache Solr
version 8.11.1 and on optimizing the data update monitoring.

- [TASK] Upgrade Solarium to 6.0.4  (TYPO3-Solr#3178)
- [FEATURE] Improve data update handling (TYPO3-Solr#3153)
- [BUGFIX] Fix thrown exception in Synonym and StopWordParser
- [TASK] Upgrade to Apache Solr 8.11.1 (TYPO3-Solr#3155)
- [TASK] Configure CI matrix for release 11.2
- [BUGFIX:BP:11.1] Fix autosuggest with non-ascii terms (TYPO3-Solr#3096)
- [BUGFIX] Prevent unwanted filter parameters from being generated
(TYPO3-Solr#3126)
- [TASK] Add Czech translation (TYPO3-Solr#3132)
- [TASK] Replace mirrors for Apache Solr binaries on install-solr.sh
(TYPO3-Solr#3094)
- [BUGFIX:BP:11-1] routeenhancer with empty filters (TYPO3-Solr#3099)
- [TASK] Use Environment::getContext() instead of GeneralUtility
- [BUGFIX] Don't use jQuery.ajaxSetup() (TYPO3-Solr#2503)
- [TASK] Setup Github Actions :: Basics
- [TASK] Setup Dependabot to watch "solarium/solarium" (#)
- [BUGFIX] Filter within route enhancers (TYPO3-Solr#3054)
- [BUGFIX] Fix NON-Composer mod libs composer.json for composer v2
(TYPO3-Solr#3053)

Resolves: TYPO3-Solr#3155
dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Feb 3, 2022
We are happy to release EXT:solr 11.2.0.
The focus of this release has been on supporting the latest Apache Solr
version 8.11.1 and on optimizing the data update monitoring.

- [TASK] Upgrade Solarium to 6.0.4  (TYPO3-Solr#3178)
- [FEATURE] Improve data update handling (TYPO3-Solr#3153)
- [BUGFIX] Fix thrown exception in Synonym and StopWordParser
- [TASK] Upgrade to Apache Solr 8.11.1 (TYPO3-Solr#3155)
- [TASK] Configure CI matrix for release 11.2
- [BUGFIX:BP:11.1] Fix autosuggest with non-ascii terms (TYPO3-Solr#3096)
- [BUGFIX] Prevent unwanted filter parameters from being generated
(TYPO3-Solr#3126)
- [TASK] Add Czech translation (TYPO3-Solr#3132)
- [TASK] Replace mirrors for Apache Solr binaries on install-solr.sh
(TYPO3-Solr#3094)
- [BUGFIX:BP:11-1] routeenhancer with empty filters (TYPO3-Solr#3099)
- [TASK] Use Environment::getContext() instead of GeneralUtility
- [BUGFIX] Don't use jQuery.ajaxSetup() (TYPO3-Solr#2503)
- [TASK] Setup Github Actions :: Basics
- [TASK] Setup Dependabot to watch "solarium/solarium" (#)
- [BUGFIX] Filter within route enhancers (TYPO3-Solr#3054)
- [BUGFIX] Fix NON-Composer mod libs composer.json for composer v2
(TYPO3-Solr#3053)

Resolves: TYPO3-Solr#3155
dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Feb 3, 2022
We are happy to release EXT:solr 11.2.0.
The focus of this release has been on supporting the latest Apache Solr
version 8.11.1 and on optimizing the data update monitoring.

New in this release
- Apache Solr 8.11.1 support
- Improved data update monitoring and handling

Beside the major changes we did several small improvements and bugfixes

- [TASK] Upgrade Solarium to 6.0.4  (TYPO3-Solr#3178)
- [BUGFIX] Fix thrown exception in Synonym and StopWordParser
- [TASK] Configure CI matrix for release 11.2
- [BUGFIX:BP:11.1] Fix autosuggest with non-ascii terms (TYPO3-Solr#3096)
- [BUGFIX] Prevent unwanted filter parameters from being generated
(TYPO3-Solr#3126)
- [TASK] Add Czech translation (TYPO3-Solr#3132)
- [TASK] Replace mirrors for Apache Solr binaries on install-solr.sh
(TYPO3-Solr#3094)
- [BUGFIX:BP:11-1] routeenhancer with empty filters (TYPO3-Solr#3099)
- [TASK] Use Environment::getContext() instead of GeneralUtility
- [BUGFIX] Don't use jQuery.ajaxSetup() (TYPO3-Solr#2503)
- [TASK] Setup Github Actions :: Basics
- [TASK] Setup Dependabot to watch "solarium/solarium" (#)
- [BUGFIX] Filter within route enhancers (TYPO3-Solr#3054)
- [BUGFIX] Fix NON-Composer mod libs composer.json for composer v2
(TYPO3-Solr#3053)

Please read the release notes:
https://github.com/TYPO3-Solr/ext-solr/releases/tag/11.2.0

Resolves: TYPO3-Solr#3155
dkd-friedrich added a commit to dkd-friedrich/ext-solr that referenced this issue Feb 9, 2022
This is a bugfix-only release and the last release for EXT:solr 11.1.x,
please update to 11.2 or even 11.5.

This release contains:

- [BUGFIX:BP:11.1] TER releases missing composer dependencies (TYPO3-Solr#3176)
- [TASK] Configure CI matrix for release 11.2
- [BUGFIX:BP:11.1] Fix autosuggest with non-ascii terms (TYPO3-Solr#3096)
- [BUGFIX] Prevent unwanted filter parameters from being generated
(TYPO3-Solr#3126)
- [TASK] Add Czech translation (TYPO3-Solr#3132)
- [TASK] Replace mirrors for Apache Solr binaries on install-solr.sh
(TYPO3-Solr#3094)
- [BUGFIX:BP:11-1] routeenhancer with empty filters (TYPO3-Solr#3099)
- [TASK] Use Environment::getContext() instead of GeneralUtility
- [BUGFIX] Don't use jQuery.ajaxSetup() (TYPO3-Solr#2503)
- [TASK] Setup Github Actions :: Basics
- [TASK] Setup Dependabot to watch "solarium/solarium"
- [BUGFIX] Filter within route enhancers (TYPO3-Solr#3054)
- [BUGFIX] Fix NON-Composer mod libs composer.json for composer v2
(TYPO3-Solr#3053)

Please read the release notes:
https://github.com/TYPO3-Solr/ext-solr/releases/tag/11.1.3

---

How to Get Involved

There are many ways to get involved with Apache Solr for TYPO3:

Submit bug reports and feature requests on GitHub
Ask or help or answer questions in our Slack channel
Provide patches through Pull Request or review and comment on existing
Pull Requests
Go to www.typo3-solr.com or call dkd to sponsor the ongoing development
of Apache Solr for TYPO3

Support us by becoming an EB partner:
https://shop.dkd.de/Produkte/Apache-Solr-fuer-TYPO3/

or call:
+49 (0)69 - 2475218 0
dkd-friedrich added a commit that referenced this issue Feb 9, 2022
This is a bugfix-only release and the last release for EXT:solr 11.1.x,
please update to 11.2 or even 11.5.

This release contains:

- [BUGFIX:BP:11.1] TER releases missing composer dependencies (#3176)
- [TASK] Configure CI matrix for release 11.2
- [BUGFIX:BP:11.1] Fix autosuggest with non-ascii terms (#3096)
- [BUGFIX] Prevent unwanted filter parameters from being generated
(#3126)
- [TASK] Add Czech translation (#3132)
- [TASK] Replace mirrors for Apache Solr binaries on install-solr.sh
(#3094)
- [BUGFIX:BP:11-1] routeenhancer with empty filters (#3099)
- [TASK] Use Environment::getContext() instead of GeneralUtility
- [BUGFIX] Don't use jQuery.ajaxSetup() (#2503)
- [TASK] Setup Github Actions :: Basics
- [TASK] Setup Dependabot to watch "solarium/solarium"
- [BUGFIX] Filter within route enhancers (#3054)
- [BUGFIX] Fix NON-Composer mod libs composer.json for composer v2
(#3053)

Please read the release notes:
https://github.com/TYPO3-Solr/ext-solr/releases/tag/11.1.3

---

How to Get Involved

There are many ways to get involved with Apache Solr for TYPO3:

Submit bug reports and feature requests on GitHub
Ask or help or answer questions in our Slack channel
Provide patches through Pull Request or review and comment on existing
Pull Requests
Go to www.typo3-solr.com or call dkd to sponsor the ongoing development
of Apache Solr for TYPO3

Support us by becoming an EB partner:
https://shop.dkd.de/Produkte/Apache-Solr-fuer-TYPO3/

or call:
+49 (0)69 - 2475218 0
dkd-kaehm pushed a commit to dkd-kaehm/ext-solr that referenced this issue Feb 9, 2024
The introduced ASCII folding filters or language depending normalization
filters lead to issue with the auto suggest function due to the
differing stemming behaviour.

To fix this issue the original token is preserved if possible, this e.g.
allows suggestions for search terms with and without accents. As this
extension might lead to unwanted duplicates a new field textSpellExact
is introduced, which considers non-ascii characters as given.

Ports: TYPO3-Solr#3117
Resolves: TYPO3-Solr#3096
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants