geturls

Extact all URLs from anchor and image tags within a html/xhtml page and its children.

Relative paths are prefixed with the root of the URL provided, i.e. full URLs are provided in all cases.

The URL provided must point to a file, so that this script can recursively obtain all the linked URLs.

Usage

Simply provide the URL of the page you would like to get URLs from, e.g.:

geturls https://www.openbookpublishers.com/htmlreader/978-1-78374-388-9/main.html

The main purpose of this script at OBP is to obtain all URLs needed to display our html books properly so that these can be submitted to the Wayback Machine.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
geturls		geturls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

geturls

Usage

About

Releases 1

Packages

Languages

License

OpenBookPublishers/geturls

Folders and files

Latest commit

History

Repository files navigation

geturls

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages