-
Notifications
You must be signed in to change notification settings - Fork 58
How to add a new Annotator
At the moment, there are two possibilities to add a new wrapper for an annotator to GERBIL.
First, you need to find the correct category for your annotator with respect to the paper of Cornolti et al.[1]:
- C2W - The annotator adds a list of concepts to a given text (without their position inside the text)
- Rc2W - The same as C2W but ranks the concepts regarding their importants for the text
- Sa2W - The same as C2W but adds a score to every concept
- D2W - The annotator gets a text with marked named entities and links these named entities to Wikipedia
- A2W - The annotator gets a text, searches for named entities and links them to Wikipedia
- Sa2W - The same as A2W but every named entity gets a score
[1] http://dl.acm.org/citation.cfm?id=2488411 .
At first you need to implement an adapter that will be used to communicate with the annotator.
This is the "old" way to add an annotator. You have to write an adapter for your system implementing its category, e.g., 'Sa2WSystem'. But for this implementation you have to take a closer look at the BAT-Framework itself. Therefore, you can examine already existing adapters like the one for Spotlight or AGDISTIS.
Note Using this solution forces you to perform step 3.
For this solution a simple NIF based web service is implemented with which the GERBIL system communicates while the web service acts as a wrapper of the annotator. The GERBIL project has a branch with the name SpotWrapNifWS4Test
which can be used as an example of such a web service. Inside this example only the class org.aksw.gerbil.ws4test.SpotlightResource
has to be copied and adapted in the following way.
// ... this is only the parsing of an incoming document
AnnotatedDocument document;
try {
document = parser.getDocumentFromNIFReader(inputReader);
} catch (Exception e) {
LOGGER.error("Exception while reading request.", e);
return "";
}
// If your system is only for entity linking, the document object should already contain a list of annotations
List<Annotation> annotations = document.getAnnotations();
String text = document.getText();
// Now we have the text and a list of annotations and could call you system for performing the entity linking task...
// ... as result a list of DisambiguatedAnnotation-objects should be created
List<Annotation> disambigAnnotations = new ArrayList<Annotation>(annotations.size());
disambigAnnotations.add(new DisambiguatedAnnotation( ... ));
// ... this new list is added to the document and the document is send back to GERBIL
document.setAnnotations(disambigAnnotations);
String nifDocument = creator.getDocumentAsNIFString(document);
return nifDocument;
After deploying the web service it can already communicate with GERBIL without performing the third step.
If you have chosen solution B at the former step, you won't need to perform this step since you already can add your annotator as a NIF-based web service during the configuration of a GERBIL experiment. If you want to add it permanently or you have chosen solution A, you will have to create an AnnotatorConfiguration
. For this step you can simply take a look at the already existing configurations of Spotlight or AGDISTIS.
Afterwards, you will have to add your annotator to the classes org.aksw.gerbil.utils.AnnotatorName2ExperimentTypeMapping
and org.aksw.gerbil.utils.Name2AnnotatorMapping
. We know that this solution is not really good and in the future weeks we will replace it by a novel solution.