Skip to content

How to add a new Annotator

RicardoUsbeck edited this page Oct 28, 2014 · 16 revisions

At the moment, there are two possibilities to add a new wrapper for an annotator to GERBIL.

1. Find the correct category

First, you need to find the correct category for your annotator with respect to the paper of Cornolti et al.[1]:

  • C2W - The annotator adds a list of concepts to a given text (without their position inside the text)
  • Rc2W - The same as C2W but ranks the concepts regarding their importants for the text
  • Sa2W - The same as C2W but adds a score to every concept
  • D2W - The annotator gets a text with marked named entities and links these named entities to Wikipedia
  • A2W - The annotator gets a text, searches for named entities and links them to Wikipedia
  • Sa2W - The same as A2W but every named entity gets a score

[1] http://dl.acm.org/citation.cfm?id=2488411 .

2. Implementing an Adapter

At first you need to implement an adapter that will be used to communicate with the annotator.

Solution A: a BAT-Framework adapter (deprecated)

This is the "old" way to add an annotator. You have to write an adapter for your system implementing its category, e.g., 'Sa2WSystem'. But for this implementation you have to take a closer look at the BAT-Framework itself. Therefore, you can examine already existing adapters like the one for Spotlight or AGDISTIS.

Note Using this solution forces you to perform step 3.

Solution B: a NIF based web service

For this solution a simple NIF based web service is implemented with which the GERBIL system communicates while the web service acts as a wrapper of the annotator. The GERBIL project has a branch with the name SpotWrapNifWS4Test which can be used as an example of such a web service. Inside this example only the class org.aksw.gerbil.ws4test.SpotlightResource has to be copied and adapted in the following way.

        // ... this is only the parsing of an incoming document
        AnnotatedDocument document;
        try {
            document = parser.getDocumentFromNIFReader(inputReader);
        } catch (Exception e) {
            LOGGER.error("Exception while reading request.", e);
            return "";
        }
        // If your system is only for entity linking, the document object should already contain a list of annotations
        List<Annotation> annotations = document.getAnnotations();
        String text = document.getText();

        // Now we have the text and a list of annotations and could call you system for performing the entity linking task...

        // ... as result a list of DisambiguatedAnnotation-objects should be created
        List<Annotation> disambigAnnotations = new ArrayList<Annotation>(annotations.size());
        disambigAnnotations.add(new DisambiguatedAnnotation( ... ));

        // ... this new list is added to the document and the document is send back to GERBIL
        document.setAnnotations(disambigAnnotations);
        String nifDocument = creator.getDocumentAsNIFString(document);
        return nifDocument;

After deploying the web service it can already communicate with GERBIL without performing the third step.

3. Adding the annotator permanently (optional)

If you have chosen solution B at the former step, you won't need to perform this step since you already can add your annotator as a NIF-based web service during the configuration of a GERBIL experiment. If you want to add it permanently or you have chosen solution A, you will have to create an AnnotatorConfiguration. For this step you can simply take a look at the already existing configurations of Spotlight or AGDISTIS.

Afterwards, you will have to add your annotator to the classes org.aksw.gerbil.utils.AnnotatorName2ExperimentTypeMapping and org.aksw.gerbil.utils.Name2AnnotatorMapping. We know that this solution is not really good and in the future weeks we will replace it by a novel solution.