Skip to content

Latest commit

 

History

History
17 lines (11 loc) · 861 Bytes

README.md

File metadata and controls

17 lines (11 loc) · 861 Bytes

Perseus Greek Corpus (in Unicode)

Scripts for scraping the raw XML of texts in the Perseus Greek Collection. List correct as of December 2017.

Text is released under the Creative Commons Attribution-ShareAlike 3.0 United States (CC BY-SA 3.0 US)

Code is released under an MIT License.

To run, use the command python downloadTexts.py (using python 3).

Results are stored in the perseus_texts folder . A list of available texts is included in available.js, and each file contains a JSON Object with

  • The Author's name
  • The Text's name
  • The number of books
  • A list of books with the raw XML (or a concatenation of raw XMLs) for that book.

A zip file of the texts is includes as perseus_texts.zip.