Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "«", "»" to blacklist characters list #496

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

SiarheiFedartsou
Copy link
Contributor

@SiarheiFedartsou SiarheiFedartsou commented Sep 26, 2024

👋

We faced an issue that our Pelias instance couldn't find ТРЦ «Respublika Park» (https://www.openstreetmap.org/way/293888317) via queries which don't contain "«", "»" symbols(e.g. I expect Respublika Park to be almost perfect match for this place). After some experimentations I narrowed it down to the fact that Pelias's ElasticSearch schema doesn't consider these symbols as blacklist ones, i.e. it becomes part of token (now we get «Respublika and Park» as separate tokens). So I propose to add it to list of blacklisted characters to fix the issue. Such symbols are quite popular in some regions, see this wiki article https://en.wikipedia.org/wiki/Guillemet


Here's the reason for this change 🚀

"«", "»" symbols are quite popular delimiters in some languages.


Here's what actually got changed 👏

Added "«", "»" symbols to blacklist characters list


Here's how others can test the changes 👀

I updated tests + it can be tested manually via bootstrapping ES instance.

@missinglink
Copy link
Member

Seems reasonable, we actually have those symbols listed as quote characters in https://github.com/pelias/parser/blob/7d28b15f3cebee2d5739266f82cb3f0daa0f17f4/tokenization/split_funcs.js#L10

@missinglink
Copy link
Member

Is this good to merge or would you like to add additional symbols?

@missinglink
Copy link
Member

I took the liberty to clean up your commits, add some additional symbols and clean up some code, hope you don»t mind 🙏

Thanks!

@SiarheiFedartsou
Copy link
Contributor Author

@missinglink I definitely don't, thanks!

@missinglink missinglink merged commit 629e0e6 into pelias:master Sep 26, 2024
6 checks passed
@missinglink
Copy link
Member

missinglink commented Sep 26, 2024

I screwed up the commit message so the docker images only just got published now after I fixed it 🤦‍♂️.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants