Simple, easy-to-use and reasonably fast library for using the Słowosieć (also known as PlWordNet) - a lexico-semantic database of the Polish language. PlWordNet can also be browsed here.
I created this library, because since version 2.9, PlWordNet cannot be easily loaded into Python (for example with nltk
), as it is only provided in a custom plwnxml
format.
Load wordnet from an XML file (this will take about 20 seconds), and print basic statistics.
import plwordnet
wn = plwordnet.load('plwordnet_4_2.xml')
print(wn)
Expected output:
PlWordnet
lexical units: 513410
synsets: 353586
relation types: 306
synset relations: 1477849
lexical relations: 393137
Find lexical units with name leśny
and print all relations, where where that unit is in the subject/parent position.
for lu in wn.find('leśny'):
for s, p, o in wn.lexical_relations_where(subject=lu):
print(p.format(s, o))
Expected output:
leśny.2 tworzy kolokację z polana.1
leśny.2 jest synonimem mpar. do las.1
leśny.3 przypomina las.1
leśny.4 jest derywatem od las.1
leśny.5 jest derywatem od las.1
leśny.6 przypomina las.1
Print all relation types and their ids:
for id, rel in wn.relation_types.items():
print(id, rel.name)
Expected output:
10 hiponimia
11 hiperonimia
12 antonimia
13 konwersja
...
See more usage examples in the examples notebook.
Note: plwordnet
requires Python 3.7 or newer.
pip install plwordnet
This library should be able to read future versions of PlWordNet without modification, even if more relation types are added. Still, if you use this library with a version of PlWordNet that is not listed below, please consider contributing information if it is supported.
- PlWordNet 4.2
- PlWordNet 4.0
- PlWordNet 3.2
- PlWordNet 3.0
- PlWordNet 2.3
- PlWordNet 2.2
- PlWordNet 2.1
See plwordnet/wordnet.py
for RelationType
, Synset
and LexicalUnit
class definitions.
load(source)
: Reads PlWordNet, wheresrc
is a path to the wordnet XML file, or a path to the pickled wordnet object. Passed paths can point to files compressed with gzip or lzma.
lexical_relations
: List of (subject, predicate, object) triplessynset_relations
: List of (subject, predicate, object) triplesrelation_types
: Mapping from relation type id to objectrelation_by_name
: Mapping from human readable relation name to relation idslexical_units
: Mapping from lexical unit id to unit objectlexical_units_by_name
: Mapping from lexical unit name to a set of matching lexical unit idssynsets
: Mapping from synset id to object(lexical|synset)_relations_(s|o|p)
: Mapping from id of subject/object/predicate to a set of matching lexical unit/synset relation ids
find(value)
: Returns a list ofLexicalUnit
, where the name is equal tovalue
. If given a specific variant (likeleśny.1
), this method returns either theLexicalUnit
, orNone
.lexical_relations_where(subject, predicate, object)
: Returns lexical relation triples, with matching subject or/and predicate or/and object. Subject, predicate and object arguments can be integer ids orLexicalUnit
andRelationType
objects.synset_relations_where(subject, predicate, object)
: Returns synset relation triples, with matching subject or/and predicate or/and object. Subject, predicate and object arguments can be integer ids orSynset
andRelationType
objects.hypernyms(synset, interlingual=False)
: Returns hypernyms of a synset (synset
can be an integer id or aSynset
object)hyponyms(synset, interlingual=False)
: Returns hyponyms of a synset (synset
can be an integer id or aSynset
object)hypernym_paths(synset, full_search=False, interlingual=False)
: Returns a hypernym path to a synset with no hypernyms (or all possible paths iffull_search=True
)dump(dst)
: Pickles theWordnet
object to opened filedst
or to a new file with pathdst
.
format(x, y, short=False)
: Substitutesx
andy
into theRelationType
display formatdisplay
. Ifshort
,x
andy
are separated by the short relation nameshortcut
.