Skip to content

teoric/gutenget-de

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scripts to fetch texts from Gutenberg-DE

These scripts can be used to fetch/grab e-texts (= ‘books’) from Gutenberg-DE.

Texts are saved in subdirectories and contain XML that is mostly XHTML + a tag <fussnote> for footnotes, without the boilerplate that surrounds the HTML presentation.

Documentation inside the scripts is in German, as are the texts at Gutenberg-DE. The code is simple, anyway.

  • GutenbergDE.pmPerl module, depends on Getopt::Long::Descriptive and HTML::TreeBuilder
  • get_gut_mod.pl – Script for getting a text. See help and code for details.
  • get_kant.pl – example script for getting Kant's texts.
  • get_may.pl – example script for getting Karl May's Orientzyklus texts.

Perl-Skripte, um Texte/Bücher von Gutenberg-DE herunterzuladen.

Die Texte werden als XTHML (zuzüglich eines Tags <fussnote> für Fußnoten) gespeichert; es wird nur der Text-Bereich der Gutenberg-DE-Seiten erfasst.

About

grab text from Gutenberg-DE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages