Wikinflection Corpus

An inflectional corpus with inflectional morpheme annotations, in 68 languages. 216K lemmas, 5.4M words. Based on the English Wiktionary (en.wiktionary.org), generated by Wikinflection (Metheniti and Neumann, 2018), evaluated with UniMorph 2.0 (Kirov et al.m 2018).

List of languages and size can be found in corpus_size.csv.

Paper

Metheniti, E. and Neumann, G. (2020). Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus. In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC2020), Marseille, France, May. European Language Resources Association (ELRA). [link] [BibTeX]

References

Kirov, C., Cotterell, R., Sylak-Glassman, J., Walther, G., Vylomova, E., Xia, P., Faruqui, M., Mielke, S., Mc-Carthy, A., Kubler, S., Yarowsky, D., Eisner, J., and Hulden, M. (2018). UniMorph 2.0: Universal Morphology. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC2018), Miyazaki, Japan, May. European Language Resources Association (ELRA).

Metheniti, E. and Neumann, G. (2018). Wikinflection: Massive semi-supervised generation of multilingual inflectional corpus from Wiktionary. In Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13–14, 2018, Oslo University, Norway, number 155, pages 147–161. Linkoping University Electronic Press.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ady		ady
ang		ang
ara		ara
ast		ast
bel		bel
bod		bod
bul		bul
cat		cat
ces		ces
chu		chu
cym		cym
dan		dan
deu		deu
dsb		dsb
ell		ell
eng		eng
est		est
eus		eus
fao		fao
fas		fas
fin		fin
frm		frm
fro		fro
gle		gle
gmh		gmh
gml		gml
got		got
grc		grc
hin		hin
hun		hun
hye		hye
isl		isl
izh		izh
kal		kal
kan		kan
kat		kat
kaz		kaz
kbd		kbd
kjh		kjh
lat		lat
lav		lav
lit		lit
liv		liv
mkd		mkd
mlt		mlt
nap		nap
nds		nds
nld		nld
oci		oci
osx		osx
pol		pol
por		por
pus		pus
que		que
ron		ron
sga		sga
slv		slv
sme		sme
spa		spa
sqi		sqi
swe		swe
syc		syc
tuk		tuk
tur		tur
ukr		ukr
urd		urd
vep		vep
vot		vot
xcl		xcl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
corpus_size.tsv		corpus_size.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikinflection Corpus

Paper

References

About

Releases

Packages

License

lenakmeth/Wikinflection-Corpus

Folders and files

Latest commit

History

Repository files navigation

Wikinflection Corpus

Paper

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages