Lemmatizer

Lemmatization functionality for Georgian language. screeve-postagger is a small encoder-decoder (sequence-to-sequence) model trained on Georgian National Corpus for lemmatizing Georgian senteces. We provide you with two types of lemmatizer.

You can use basic Lemmatizer in the following manner:

from transformers import BartForConditionalGeneration, AutoTokenizer, pipeline

model = BartForConditionalGeneration.from_pretrained("Nargizi/screeve-lemmatizer")
tokenizer = AutoTokenizer.from_pretrained("Nargizi/screeve-lemmatizer")

tagger = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

ka_word = "დამიწერია"
result = tagger(ka_word)
print(result)

For the Lemmatizer with POS tags as additional input use the following code:

from transformers import BartForConditionalGeneration, AutoTokenizer, pipeline

model = BartForConditionalGeneration.from_pretrained("Nargizi/screeve-pos-lemmatizer")
tokenizer = AutoTokenizer.from_pretrained("Nargizi/screeve-pos-lemmatizer")

ka_word = "დამიწერია"
pos_tag = "V"

encoded_ka_word = tokenizer(ka_word, text_pair=pos_tag, return_tensors="pt")
generated_tokens = model.generate(**encoded_ka_word)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

print(result)

Label descriptions:

Abbreviation	Description
V	Verb
N	Noun
Pron	Pronoun
Interj	Interjection
A	Person’s name
Adv	Adverb
Cj	Conjunction
Punct	Punctuation
Num	Number
Pp	Preposition

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lemmatizer

About

Releases

Packages

License

screeve/lemmatizer

Folders and files

Latest commit

History

Repository files navigation

Lemmatizer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages