Skip to content
Mat Kelly edited this page Jun 5, 2017 · 5 revisions

Crawl Information

Wail Heritrix Crawl Monitor

Detailed information about a currently running or previous crawls is shown here.

Each crawl's seed URL, crawl status, timestamp of update, number of URLs in the seed(s) web page have been processed or awaiting processing and actions that can be performed.

Crawl Actions

Wail Heritrix Crawl Control Actions

The actions provided per crawl are shown in the image below.

The crawl's configuration is displayable via the view config option. Starting, restarting, and termination of a crawl is done through this menu. Crawls may also be deleted through this menu as well. Deleting the crawl through this option permanently deletes it from the file system but does not affect the WARCs produced by the crawl.

Other Actions

A crawl's job directory can be re-scanned by clicking the Rescan Job Directory button, located on the bottom toolbar.

The web UI provided by Heritrix can be viewed in the default browser by clicking the Launch Web UI button, also located on the bottom toolbar.