arabic_event_gsr

Gold standard records for Arabic event data

Event Detection

We provide two "event detection" datasets for Arabic language event coding, one for ASSAULT events and one for PROTEST events. These events are coded using the PLOVER ontology, which is similar to the CAMEO ontology. THese files can be used to test an automated coder's ability to recognize these two event types in Arabic text.

protest_gsr.csv and assault_gsr.csv each have the following columns:

accept: the number of annotators accepting the event label as true
event_type: "ASSAULT" or "PROTEST", depending on the file
id: the ID number of the sentence
label: one of "yes", "easy no", "difficult no", or "ambiguous", depending on the set of labels provided by annotators. "yes" is unanimous accept, "easy no" is unanimous reject, "hard no" is mostly reject with a dissenting accept, and "ambiguous" are entries with insufficent labels to be sure.
reject: the number of annotators who rejected the label.
text: the text shown to the annotator and that should be provided to the event detection system
total: the total number of annotations provided on the sentence.

Span Recognition and Labeling

Another set of files (assault_spans.json and protest_spans.json) include information for the gold standard recognized events, consisting of the event verb, the source actor and target actor spans, and resulting CAMEO actor codes for each.

source_gold, target_gold, and verb_gold report the common identified by all coders as part of the span.

Petrarch validation format

The two files are also available in XML format, suitable for use in UniversalPetrarch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

arabic_event_gsr

Event Detection

Span Recognition and Labeling

Petrarch validation format

Files

README.md

Latest commit

History

README.md

File metadata and controls

arabic_event_gsr

Event Detection

Span Recognition and Labeling

Petrarch validation format