Skip to content
Paul Houle edited this page Oct 24, 2013 · 4 revisions

Parallel Super Eyeball 3

pse3 stands for the Parallel Super Eyeball 3, the Infovore 2 version of what was called ParallelSuperEyeball in Infovore 1. The 3 stands for "triple", not for version 3, so a hypothetical version for quads would be called pse4.

pse3 accept triple-like rows that we call PrimitiveTriples and applies a Jena-based parser to valid and normalize the input. The output is highly conformant to RDF standards. This solves the problem that invalid triples often lead to the failure of RDF tools importing them. PSE3 also passes triples through a uniqification step that ensures that your collection of triples is a set and not a bag.

In theory, pse3 should accept any N-Triples file, but it has only been tested so far with the output of the freebaseRDFPrefilter as well as published data dumps from DBpedia and DBpedia Live.

How to run it

hadoop jar bakemono-*-job.jar run pse3 inputFile/ outputFile/
Clone this wiki locally