Skip to content

Latest commit

 

History

History
26 lines (19 loc) · 2.45 KB

README.md

File metadata and controls

26 lines (19 loc) · 2.45 KB

Awesome Bible NLP

A curated list of resources dedicated to Biblical Natural Language Processing

Contribute your favorite Biblical NLP resource by raising a pull request! Please read the contribution guidelines before raising a pull request.

Machine Translation

Audio

  • Snow Mountain Dataset: Open-licensed and formatted dataset of audio recordings of the Bible in low-resource Indian languages.

Original Languages

  • Macula Hebrew | Greek: Open-licensed and curated dataset of the Bible in Hebrew and Greek with various connected meta resources (e.g. Syntax trees, glosses, semantic roles).

Tokenizers

  • utoken: Universal tokenizer in Python and CLI interface that is also tested on Biblical text.

Romanizers

  • uroman: Universal Romanizer that can convert any unicode script to roman (latin) script

Toolkits

  • SIL Machine | Python version | JavaScript Version: Toolkit for various NLP operations on Biblical content (especially support for Paratext projects).
  • Wildebeest: Investigate, repair and normalize text for a wide range of issues at the character level. Especially tested on Biblical content.