SpkAtt-2023

This repository contains the data and supplementary materials for Task 1 of the

2023 Shared Task on Speaker Attribution (SpkAtt-2023),

co-located with KONVENS 2023.

Important dates:

~~February, 2023 - Trial data release~~
~~April 1, 2023 - Training and development data release~~
~~June 15, 2023 - Test data release (blind)~~
~~July 1, 2023 - Submissions open~~
~~July 31, 2023 - Submissions for Task1, subtask1 (full task) close~~
~~August 3, 2023 - Submissions for Task1, subtask 2 (roles only) close~~
~~August 14, 2023 - System descriptions due~~
~~September 15, 2023 - Camera-ready system paper deadline~~
September 18, 2023 - Workshop at KONVENS 2023

Workshop programm

Sep 18, 2023 @ KONVENS 2023

Program schedule

15:00 Uhr: Welcome & Shared Task Overview (ST organisers)
15:30 Uhr: Speaker Attribution in German Parliamentary Debates with QLoRA-adapted Large Language Models (Tobias Bornheim, Niklas Grieger, Patrick Gustav Blaneck and Stephan Bialonski)
16:00 Uhr: Politics, BERTed: Automatic Attribution of Speech Events in German Parliamentary Debates (Anton Ehrmanntraut)
16:30 Uhr: Discussion
17:00 Uhr: Closing

Proceedings

The proceedings can be found here: pdf.

Task 1 data format:

The data is available in json format where each document (speech) is a json file.

The unit of analysis is a ~~paragraph~~ sentence (we changed the format from paragraphs to sentences).

The json dictionary includes a list of Sentences and a list of Annotations. Each item in the Sentences list is a dictionary with SentenceID and a list of Tokens for this sentence. Each item in the Annotations list is a dictionary that includes the ids (sentence:token id) for the cue word(s) that trigger a speech event and the ids for the roles that are realised for this cue.

For a more detailed description of the data format (Task 1) and some examples, see this pdf. For more information on our annotation scheme, please refer to the annotation guidelines. Please note that the guidelines have not yet been finalised and might include some inconsistencies and errors that we try to fix in the next couple of weeks.

We tried to harmonise the data format for Task1 and Task2 as much as possible, which resulted in a file format where the annotations are separated from the text. This makes it a bit harder to inspect the data. We therefore also provide an alternative data format, mostly to make the data more human-readible. This alternative format is described here. However, the official shared task format is the one described in the first document (Dataformat_Task1_a.pdf) and we do not provide evaluation scripts for the second format.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
data		data
doc		doc
script		script
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpkAtt-2023

2023 Shared Task on Speaker Attribution (SpkAtt-2023),

Important dates:

Workshop programm

Sep 18, 2023 @ KONVENS 2023

Proceedings

Task 1 data format:

About

Releases

Packages

Languages

umanlp/SpkAtt-2023

Folders and files

Latest commit

History

Repository files navigation

SpkAtt-2023

2023 Shared Task on Speaker Attribution (SpkAtt-2023),

Important dates:

Workshop programm

Sep 18, 2023 @ KONVENS 2023

Proceedings

Task 1 data format:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages