Skip to content

Contributing to Ancient Linguistic Annotation Projects

francescomambrini edited this page Nov 22, 2022 · 7 revisions

SunoikisisDC Digital Classics, Autumn 2022

Session 4: Contributing to Ancient Linguistic Annotation Projects

Tuesday November 22, 2022, starting at 16:00 GMT = 17:00 CET (for 90 minutes)

Convenors: Francesca Dell’Oro (Université de Neuchâtel), Francesco Mambrini (Università Cattolica del Sacro Cuore, Milan)

Youtube link: https://youtu.be/tLBOuKyQuWo

Slides: tba

Outline

This session introduces two approaches to creating linguistic annotations of ancient texts. The first part gives an overview to Universal Dependencies (UD), the most important cross-linguistic standard for morphosyntactic annotation. The session includes a short introduction, followed by a longer practical session dedicated to treebanks for ancient languages, tools for automatic annotation, and tools for querying UD corpora.

The second part outlines the WoPoss project, aiming at the description of modality in the Latin language, and presents automatic and manual annotation of a Latin work (Satyricon). First, we focus on the automatic annotation of lemmas, parts of speech (henceforth PoS) and morphological analysis. Then we outline the manual annotation of a modal passage according to a simplified version of the WoPoss Guidelines (Dell’Oro 2022). The search interface to query the corpus will be also briefly presented (https://woposs.unine.ch/form.html).

Suggested readings

Other resources

Exercise

Exercise 1: WoPoss

It is possible to contribute to the WoPoss corpus in two ways, as shown by the exercises that will be suggested in the session.

  1. You can correct the results of the automatic annotation (lemmas, PoS and morphological analysis). If you want to try it during or after the session, you will need to create a GitHub account (https://github.com/).
  2. You can also try to annotate a modal passage by yourself. In this case, you will need an Inception account. Just ask the WoPoss team to create one for you (write to francesca.delloro@unine.ch) before or after the session. Your contribution will be recognised in the file description.

Exercise 2: Universal Dependencies (UD)

You can replicate the workflow for automatically generating UD annotation using UDPipe 2, then review your annotation either by editing the raw CoNLL-U text or using conllueditor.

  1. Select a text in any ancient (or modern) language that you prefer; copy/paste or save an excerpt in a TXT file; optionally, pre-process it as you like (e.g. delete titles, paragraph or page numbers, etc.);
  2. Go to the UDPipe 2 web service;
  3. Select the language model by opening the drop-down selection tool and scrolling until you find a suitable model. If the language of your text does not exist, you can always try to use a random model: the results will be ugly, but at least you will have a CoNLL-U file to play with;
  4. Save the CoNLL-U file and edit it:
  • with any text editor (SublimeText and Atom have plugins for the CoNLL-U format);
  • you can use this web page to visulasize (by copy/pasting) your sentence;
  • or follow conllueditor's installation instructions and use conllueditor to edit your treebank.

Have fun with UD!