能够支持中文吗 #53

lierisme · 2017-05-05T10:12:35Z

能够支持中文吗

weixsong · 2017-05-18T06:55:08Z

Hi, @lierisme , if you want to support Chinese, you just need to update the tokenizer, you could find a Chinese tokenizer, or, you could just tokenize Chinese by each word.
Chinese tokenizer is a complex package, so if you want to support a good chinese tokenizer, I think elasticlunr.js should not be run in browser

caihaibin1991 · 2021-01-23T06:46:19Z

I replace the default to use jieba, but I do not why "Pipeline" return a empty array.

hepezu · 2022-08-14T21:48:20Z

@caihaibin1991

For supporting Chinese ,

Change tokenizer or do tokenization before elasticlunr's default tokenizer. For example, "能够支持中文吗"->"能够支持中文吗", after this preprocessing, the default tokenizer can work by split space. For tokenization, nodejieba, @node-rs/jieba or regex to split every character can be used.
[Your issue] Remove default elasticlunr.trimmer, by index.pipeline.remove(elasticlunr.trimmer). The default trimmer trims all non-english characters, which results your issue.
Further work. Make your own pipeline to process text, including tokenizer, trimmer, stemmer and stopword filter. Examples are in weixsong/lunr-languages.

weixsong closed this as completed May 18, 2017

weihanglo mentioned this issue Oct 29, 2019

Search does not support non-English languages rust-lang/mdBook#1081

Open

peaceshi mentioned this issue Feb 22, 2020

can't search chinese rust-lang/mdBook#1120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

能够支持中文吗 #53

能够支持中文吗 #53

lierisme commented May 5, 2017

weixsong commented May 18, 2017

caihaibin1991 commented Jan 23, 2021

hepezu commented Aug 14, 2022

能够支持中文吗 #53

能够支持中文吗 #53

Comments

lierisme commented May 5, 2017