Skip to content

Assem's Arabic Light Stemmer is a snowball-based stemming algorithm for Arabic aimed mainly to improve search.

License

Notifications You must be signed in to change notification settings

ibnmalik/arabicstemmer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assem's Arabic Stemmer Gitter

This is an algorithm for Arabic stemming written on Snowball framework language. If offers light stemming and text normalization. voc

Requirements:

    $ make download
  • Install python requirements
    $ sudo pip install -r requirements.txt

or manually by:

  • extracting snowball into the root folder {Root}/snowball
  • extracting snowball-data/arabic/voc.txt.gz into {Root}/test_data/voc.txt

Build:

  • light stemming
      $ make build
  • root-based stemming
      $ make build_root_based_stemmer

Run:

  • Light Stemmer
  	 $ make run
  	  الطالب
  	  طالب
  • Root-Based Stemmer
      $ make run_root
      الطالب
      طلب

Test:

We configured tests to run against snowball-data arabic sample.

  • time:
      $ make time
  • grouping effect:
      $ make grouping
  • all:
      $ make test
  • Test SAS with golden arabic corpus:
      $ make test_arabicstemmer
  • Test ISRI Stemmer with golden arabic corpus:
     $ make test_isri

Distributions:

  • dist light stemmer to available languages:
    $ make dist
  • dist root-based stemmer to available languages:
    $ make dist_rooter

Results:

Snowball Arabic (Stemmer & rooter) Results

Word Stem root
طفل طفل طفل
اطفال اطفال طفل
الاطفال اطفال طفل
اطفالكم اطفال طفل
فأطفالكم اطفال طفل
اطفالهم اطفال طفل
والاطفال اطفال طفل
فاطفالهم اطفال طفل
وطفل طفل طفل
الطفولة طفول طفل
والطفلتين طفل طفل
طفلتان طفل طفل

About

Assem's Arabic Light Stemmer is a snowball-based stemming algorithm for Arabic aimed mainly to improve search.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.1%
  • Makefile 49.9%