Skip to content

A browsertrix crawler configuration for archiving stl-tsl.org

Notifications You must be signed in to change notification settings

sul-dlss-labs/stl-tsl-archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains a configuration for archiving the The Special Tribunal for Lebanon using browsertrix-crawler. It contains a custom-behavior to automatically select and download PDFs that aren't fetched by browsertrix-crawler's standard behaviors or Archive-It.

To run the crawl you should install Docker and then run the run.sh script. This should run for about 8-9 hours, and at the end you should find a WACZ file in collections/stl-tsl/stl-tsl.wacz which you can view in ArchiveWebPage. In the case of the SDR it was accessioned using the WAS Registrar App.

About

A browsertrix crawler configuration for archiving stl-tsl.org

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published