Arabic Analysis plugin for Elasticsearch. It uses lucene-arabic-analyzer to extract arabic token roots.
- Normalizes input text by removing diacritics and Hamza-like characters
- Extracts word's roots.
arabic-root
analyzer
GET _analyze
{
"analyzer": "arabic-root",
"text": "اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ"
}
// Result:
['هدن','هدي','صرط','قوم']
This plugin is preconfigured with builtin normalization, stop-words and a stemmer which is derived from lucene-arabic-analyzer.
- Build the plugin:
mvn clean package
- Run Elasticsearch and install plugin inside a docker container:
docker compose up
- Open
http://localhost:5601/
and login withelastic/elastic
credentials. - Go to
Dev Tools
and examine the plugin:
GET _analyze
{
"text": "اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ",
"analyzer": "arabic-root"
}