Releases: sinaahmadi/klpt
Releases · sinaahmadi/klpt
Stemming and lemmatization fully covered for Sorani
In the version, the following are done:
- It is possible to stem ("بڕ" → "بڕاوە") and lemmatize ("بردن" → "بردمنەوە") words of all part-of-speech. Up to version 0.1.4, stemming was only possible for verbs.
- For stemming unknown words, a rule-based approach is provided.
- When using the morphological analyzer (in the stem module), prefixes and suffixes are returned separately. These used to be previously merged.
- The tagged lexicon is updated and further enriched with more lexical entries, particularly proper nouns.
Stemming of Sorani verbs added
In this version, in addition to the morphological analysis of Sorani and Kurmanji, it is possible to stem verbs in Sorani as in
کڕیومن ← کڕ
دەچینەوە ← چ
Morphological analysis of Kurmanji added
v0.1.2 version 0.1.2
initial development release
Project setup and initial features