-
Notifications
You must be signed in to change notification settings - Fork 58
How to add a new Annotator
At the moment, there are two possibilities to add a new wrapper for an annotator to GERBIL.
First, you need to find the correct category for your annotator with respect to the paper of Cornolti et al.[1]:
- C2W - The annotator adds a list of concepts to a given text (without their position inside the text)
- Rc2W - The same as C2W but ranks the concepts regarding their importants for the text
- Sa2W - The same as C2W but adds a score to every concept
- D2W - The annotator gets a text with marked named entities and links these named entities to Wikipedia
- A2W - The annotator gets a text, searches for named entities and links them to Wikipedia
- Sa2W - The same as A2W but every named entity gets a score
[1] http://dl.acm.org/citation.cfm?id=2488411 .
At first you need to implement an adapter that will be used to communicate with the annotator.
This is the "old" way to add an annotator. You have to write an adapter for your system implementing its category, e.g., 'Sa2WSystem'. But for this implementation you have to take a closer look at the BAT-Framework itself. Therefore, you can examine already existing adapters like the one for Spotlight or AGDISTIS.
Note Using this solution forces you to perform step 3.
For this solution a simple NIF based web service is implemented with which the GERBIL system communicates while the web service acts as a wrapper of the annotator.
First, you have to get the gerbil.nif.transfer
library that can be downloaded from http://139.18.2.164/mroeder/gerbil/gerbil.nif.transfer-0.0.1.zip . If you are using maven and extracted the zip file, you can install it locally using
mvn install:install-file -Dfile=target/gerbil.nif.transfer-0.0.1.jar -Dpackaging=jar -Djavadoc=target/gerbil.nif.transfer-0.0.1-javadoc.jar -Dsources=target/gerbil.nif.transfer-0.0.1-sources.jar -DpomFile=pom.xml
The GERBIL project has a branch with the name SpotWrapNifWS4Test
which can be used as an example of such a NIF based web service. Inside this example only the class org.aksw.gerbil.ws4test.SpotlightResource
has to be copied and adapted in the following way.
// ... this is only the parsing of an incoming document
Document document;
try {
document = parser.getDocumentFromNIFReader(inputReader);
} catch (Exception e) {
LOGGER.error("Exception while reading request.", e);
return "";
}
// If your system is only for entity linking, the document object
// should already contain a list of markings
List<Marking> markings = document.getMarkings();
String text = document.getText();
// Now we have the text and a list of markings (this could be
// empty or contain Span objects which would mark the named
// entities inside the text) and could call you system for
// performing the entity linking task...
// ... as result a list of NamedEntity or ScoredNamedEntity objects
// should be created for the A2W or Sa2W tasks respectively. For
// C2W, Rc2W or Sc2W you should create a list of Annotations or
// ScoredAnnotations
List<Marking> entities = new ArrayList<Marking>(markings.size());
entities.add(new NamedEntity( ... ));
// ... this new list is added to the document and the document is
// send back to GERBIL
document.setMarkings(entities);
String nifDocument = creator.getDocumentAsNIFString(document);
return nifDocument;
After deploying the web service, it is already able to communicate with GERBIL without performing the third step. You simply have to insert its URL at the configuration screen of an experiment.
If you have chosen solution B at the former step, you won't need to perform this step since you already can add your annotator as a NIF-based web service during the configuration of a GERBIL experiment. If you want to add it permanently or you have chosen solution A, you will have to create an AnnotatorConfiguration
. For this step you can simply take a look at the already existing configurations of Spotlight or AGDISTIS.
Afterwards, you will have to add your annotator configuration to the getInstance()
method of the class org.aksw.gerbil.utils.AnnotatorMapping
. We know that this solution is not really good and in the future weeks we will replace it by a novel solution.