Skip to content

Turkish

Wahed Hemati edited this page Nov 28, 2017 · 2 revisions

Tool Zemberek-NLP

Source: https://github.com/ahmetaa/zemberek-nlp

UIMA: https://github.com/texttechnologylab/textimager-uima

Available Annotator:

  • TokenizerDefault
  • TokenizerAll
  • Sentence Boundary Detection
  • Lemmatizer
  • Stemmer
  • Part of Speech
  • Deasciifier
  • Spellchecker
  • Disambiguator

License [Apache License] (https://github.com/ahmetaa/zemberek-nlp/blob/master/LICENSE)


Tool Polyglot

Source: http://polyglot.readthedocs.io/en/latest/index.html

UIMA: https://github.com/texttechnologylab/textimager-uima

Available Annotator:

  • Tokenization (165 Languages)
  • Language detection (196 Languages)
  • Named Entity Recognition (40 Languages)
  • Part of Speech Tagging (16 Languages)
  • Sentiment Analysis (136 Languages)
  • Word Embeddings (137 Languages)
  • Morphological analysis (135 Languages)
  • Transliteration (69 Languages)

License [GPLv3] (http://polyglot.readthedocs.io/en/latest/)


Resha-Turkish-Stemmer

Source: https://github.com/hrzafer/resha-turkish-stemmer

UIMA: https://github.com/texttechnologylab/textimager-uima

Available Annotator:

  • Stemmer

License [MIT License] (https://github.com/hrzafer/resha-turkish-stemmer/blob/master/LICENSE)


Turkish-Deasciifier

Source: https://github.com/ahmetb/turkish-deasciifier-java

UIMA: https://github.com/texttechnologylab/textimager-uima

Available Annotator:

  • Deasciifier

License [Apache License] (https://github.com/ahmetb/turkish-deasciifier-java/blob/master/LICENSE)


Tool Turkish Natural Language Toolkit (https://github.com/aliok/trnltk-java)

Merged with Zemberek-NLP


  • Tokenizer
  • Segmenter
  • Lemmatizer
  • Stemmer
  • Part of Speech
  • Unknown word guesser
  • Hyphenation-Tool

License [MIT License] (https://github.com/coltekin/TRmorph/blob/master/LICENSE)

Requires foma and a C preprocessor


ITU Turkish Natural Language Processing Pipeline (http://tools.nlp.itu.edu.tr/)

Web API (Token Needed)

  • Tokenizer
  • Normalization
  • Deasciifier
  • Vowelizer
  • Spelling Corrector
  • isTurkish
  • Morphological Analyzer

Web API


Use of Zemberek StemFilter and TRmorph StemFilter


Requires Python and NLTK


UIMA https://github.com/dkpro/dkpro-core/tree/master/dkpro-core-snowball-asl