This package uses the spaCy 2.0 extensions to add IWNLP-py as German lemmatizer directly into your spaCy pipeline.
Please report bugs with spacy-iwnlp as issue in IWNLP-py.
import spacy
from spacy_iwnlp import spaCyIWNLP
nlp = spacy.load('de')
iwnlp = spaCyIWNLP(lemmatizer_path='data/IWNLP.Lemmatizer_20181001.json')
nlp.add_pipe(iwnlp)
doc = nlp('Wir mögen Fußballspiele mit ausgedehnten Verlängerungen.')
for token in doc:
print('POS: {}\tIWNLP:{}'.format(token.pos_, token._.iwnlp_lemmas))
- Use pip to install spacy-iwnlp
pip install spacy-iwnlp
- Download the latest processed IWNLP dump from http://lager.cs.uni-duesseldorf.de/NLP/IWNLP/IWNLP.Lemmatizer_20181001.zip and unzip it.
Please include the following BibTeX if you use IWNLP in your work:
@InProceedings{liebeck-conrad:2015:ACL-IJCNLP,
author = {Liebeck, Matthias and Conrad, Stefan},
title = {{IWNLP: Inverse Wiktionary for Natural Language Processing}},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
year = {2015},
publisher = {Association for Computational Linguistics},
pages = {414--418},
url = {http://www.aclweb.org/anthology/P15-2068}
}