Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit default full-text stopword list #176

Closed
RocketMan opened this issue Jul 30, 2020 · 1 comment
Closed

Revisit default full-text stopword list #176

RocketMan opened this issue Jul 30, 2020 · 1 comment
Labels
ops This is an operational issue

Comments

@RocketMan
Copy link
Owner

RocketMan commented Jul 30, 2020

The default full-text stopword list is quite large:

https://dev.mysql.com/doc/refman/8.0/en/fulltext-stopwords.html#fulltext-stopwords-stopwords-for-myisam-search-indexes

Unfortunately, it contains many words which are significant in the identification of music-related material.

I propose we override the default stopword list with the much smaller stopword list that is used by default for InnoDB search indexes:

https://dev.mysql.com/doc/refman/8.0/en/fulltext-stopwords.html#fulltext-stopwords-stopwords-for-innodb-search-indexes

Given that zookeeper is not a massive database, the overhead to index the extra words should not present a storage or performance problem.

@RocketMan RocketMan added this to the v2.10.2 milestone Jul 30, 2020
@RocketMan RocketMan added the ops This is an operational issue label Jul 31, 2020
@RocketMan RocketMan removed this from the v2.10.2 milestone Jul 31, 2020
@RocketMan
Copy link
Owner Author

RocketMan commented Jul 31, 2020

The kzsu production configuration has been updated and the indexes rebuilt.

The list of stopwords in kzsu prod as of 2020-07-31

In addition, the minimum full text word length has been reduced from 4 characters to 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ops This is an operational issue
Projects
None yet
Development

No branches or pull requests

1 participant