Skip to content

Scripts for scraping the raw XML of texts in the Perseus Greek Collection

Notifications You must be signed in to change notification settings

storey/perseus_greek_unicode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Perseus Greek Corpus (in Unicode)

Scripts for scraping the raw XML of texts in the Perseus Greek Collection. List correct as of December 2017.

Text is released under the Creative Commons Attribution-ShareAlike 3.0 United States (CC BY-SA 3.0 US)

Code is released under an MIT License.

To run, use the command python downloadTexts.py (using python 3).

Results are stored in the perseus_texts folder . A list of available texts is included in available.js, and each file contains a JSON Object with

  • The Author's name
  • The Text's name
  • The number of books
  • A list of books with the raw XML (or a concatenation of raw XMLs) for that book.

A zip file of the texts is includes as perseus_texts.zip.

About

Scripts for scraping the raw XML of texts in the Perseus Greek Collection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages