Skip to content
This repository has been archived by the owner on Jul 10, 2019. It is now read-only.

Behemoth Modules

Julien Nioche edited this page Apr 8, 2015 · 10 revisions

External modules

  • CommonCrawl - Imports documents from the CommonCrawl dataset on Amazon S3 and converts them to the BehemothDocument format for further processing. Supports the 2012 format as well as the old one.

  • Text Classification - Classifies Behemoth documents with a model generated by our Text Classification API.

  • ElasticSearch - Sends documents to ElasticSearch for indexing.

Home

Clone this wiki locally