Hierarchical evaluation measures

Implementation of hierarchical F-measure (hF), hierarchical precision (hP) and hierarchical recall (hR) and exact precision. The script was developed for evaluation of type quality in DBpedia.

#Example execution

python computeHmeasures.py "en.lhd.core.2014.nt" "dbpedia_2014.owl" "gs3-toDBpedia2014.nt" "en.lhd.core.gs3.log"

Input files - Gold standard datasets

en.lhd.core.2014.nt
dbpedia_2014.owl
gs3-toDBpedia2014.nt

The datasets are described in our JWS paper

Additional details can be found here http://ner.vse.cz/datasets/linkedhypernyms/

Output

reading gs
reading predicted
finished reading input datasets
total instances in groundtruth:1033.0
total instances in intersection of groundtruth and prediction:402.0
hP:0.864357864358
hR:0.370553665326
hF:0.518726997186
Precision (exact):0.654228855721

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hierarchical evaluation measures

Input files - Gold standard datasets

Output

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hierarchical evaluation measures

Input files - Gold standard datasets

Output