Gold standard records for Arabic event data
We provide two "event detection" datasets for Arabic language event coding, one for ASSAULT events and one for PROTEST events. These events are coded using the PLOVER ontology, which is similar to the CAMEO ontology. THese files can be used to test an automated coder's ability to recognize these two event types in Arabic text.
protest_gsr.csv
and assault_gsr.csv
each have the following columns:
accept
: the number of annotators accepting the event label as trueevent_type
: "ASSAULT" or "PROTEST", depending on the fileid
: the ID number of the sentencelabel
: one of "yes", "easy no", "difficult no", or "ambiguous", depending on the set of labels provided by annotators. "yes" is unanimous accept, "easy no" is unanimous reject, "hard no" is mostly reject with a dissenting accept, and "ambiguous" are entries with insufficent labels to be sure.reject
: the number of annotators who rejected the label.text
: the text shown to the annotator and that should be provided to the event detection systemtotal
: the total number of annotations provided on the sentence.
Another set of files (assault_spans.json
and protest_spans.json
) include
information for the gold standard recognized events, consisting of the event
verb, the source actor and target actor spans, and resulting CAMEO actor codes
for each.
source_gold
, target_gold
, and verb_gold
report the common identified by
all coders as part of the span.
The two files are also available in XML format, suitable for use in UniversalPetrarch.