This repository contains the data and supplementary materials for Task 1 of the
co-located with KONVENS 2023.
February, 2023 - Trial data releaseApril 1, 2023 - Training and development data releaseJune 15, 2023 - Test data release (blind)July 1, 2023 - Submissions openJuly 31, 2023 - Submissions for Task1, subtask1 (full task) closeAugust 3, 2023 - Submissions for Task1, subtask 2 (roles only) closeAugust 14, 2023 - System descriptions dueSeptember 15, 2023 - Camera-ready system paper deadline- September 18, 2023 - Workshop at KONVENS 2023
Sep 18, 2023 @ KONVENS 2023
Program schedule
- 15:00 Uhr: Welcome & Shared Task Overview (ST organisers)
- 15:30 Uhr: Speaker Attribution in German Parliamentary Debates with QLoRA-adapted Large Language Models (Tobias Bornheim, Niklas Grieger, Patrick Gustav Blaneck and Stephan Bialonski)
- 16:00 Uhr: Politics, BERTed: Automatic Attribution of Speech Events in German Parliamentary Debates (Anton Ehrmanntraut)
- 16:30 Uhr: Discussion
- 17:00 Uhr: Closing
The proceedings can be found here: pdf.
The data is available in json format where each document (speech) is a json file.
The unit of analysis is a paragraph sentence (we changed the format from paragraphs to sentences).
The json dictionary includes a list of Sentences and a list of Annotations. Each item in the Sentences list is a dictionary with SentenceID and a list of Tokens for this sentence. Each item in the Annotations list is a dictionary that includes the ids (sentence:token id) for the cue word(s) that trigger a speech event and the ids for the roles that are realised for this cue.
For a more detailed description of the data format (Task 1) and some examples, see this pdf. For more information on our annotation scheme, please refer to the annotation guidelines. Please note that the guidelines have not yet been finalised and might include some inconsistencies and errors that we try to fix in the next couple of weeks.
We tried to harmonise the data format for Task1 and Task2 as much as possible, which resulted in a file format where the annotations are separated from the text. This makes it a bit harder to inspect the data. We therefore also provide an alternative data format, mostly to make the data more human-readible. This alternative format is described here. However, the official shared task format is the one described in the first document (Dataformat_Task1_a.pdf) and we do not provide evaluation scripts for the second format.