This is a Lucene filter and filter factory (see http://lucene.apache.org ) to fold certain CJK characters to improve recall. You should put it in your analysis chain BEFORE ICUTransforms from Traditional->Simplified Han, as it converts modern Japanese Kanji to their traditional equivalents.
- clone the project
git clone git://github.com/solrmarc/CJKFilterUtils.git
- run the maven installation
mvn clean install
- put the
CJKFilterUtils*.jar
file found in the target directory into your Solr lib directory - utilize the Solr CJKFoldingFilterFactory in your schema.xml file.
(Uses Ruby)
Install Ruby dependencies
$ bundle install
Setup Solr with CJKFilterUtils and config/schema
$ bundle exec rake setup_server
Run solr_wrapper
$ solr_wrapper
In another shell, index fixtures
$ bundle exec rake fixtures
Run some queries (these should return results):
$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:呂思勉两晋南北朝&wt=json
$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:俞平伯红楼梦&wt=json
$ curl http://127.0.0.1:8983/solr/test/select?debugQuery=on&indent=on&q=cjk_test:南洋&wt=json
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request