archive.org_fileripper

Rips all files (by default .mp4 and .mkv videos) from achive.org/download/* pages through the commandline. No browser needed.

Use Case

Archive.org already provides a way to download multiple files by having them sent to a .zip file first.

However, in the case of collections that are too large (over 50GB) this option is not available and will fail due to filesize. It is also extremely slow to generate the .zip even if does work.

Furthermore, if a page also hads hundreds or thousands of files and you only want to download eg. videos, even if it works you will get a .zip with mixed contents and additional size.

Even if the .zip is created, Archive.org limits the download speed to between 500KB/s and 1.2MB/s per file. If your .zip is several gigabytes this can take a long time (hours or days) to download and will likely fail.

So I created this script. It will download whatever file extensions you choose (by default .mp4 & .mkv) on an archive.org page. It will download between 4 and 5 files at a time (the maximum allowed by archive.org) and at the fastest possible speed through the command line. It uses python3 and a few basic libraries. The current progress is displayed whilst the script is running including:

Filename of file being downloaded
Current speed
Current downloading time
Estimated remaining time

There is also basic verbose error logging.

Usage

Simply specify the DEFAULT_START_URL of the page with all of the links for that archive.org page. For example: https://archive.org/download/<your_page_of_links_here>
Also specify the DEFAULT_OUTPUT_DIR eg. ~/Downloads/archive_org_videos
Run the script.

The script requires:

Python3
Pip
BeautifulSoup
urljoin
unquote
tqdm

The script will fetch the files (currently .mp4 and .mkv) files from any of the links within the 'download-directory-listing' class on the page.

You can also change the file extensions you are interested in downloading, on line 62 (by default .mp4 and .mkv files) if link['href'].lower().endswith(('.mp4', '.mkv')):

Quit the script gacefully with "Q+ENTER" keyboard input.

Troubleshooting

If in the future this script breaks.

Check if your links are still within the above class.
Be sure that you are on the right page. This script is configured to only work on old HTML directory structure pages that contain a list of bare links to pages with direct links to files. eg. https://archive.org/download/<your_page_of_links_here>/<list_of_actual_files_here>
Start at the top page, which contains all the links to other pages which have the files. The script does not indefinitely follow links to try and find matching video files.
Ensure the page actually has a file with a matching file extension on it.
Open an Issue Report on Github

Absolutely no warranty is implied using this script.

Do not use this script to break Archive.org's ToS.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
archiveorg_fileripper.py		archiveorg_fileripper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

archive.org_fileripper

Use Case

Usage

Troubleshooting

About

Releases

Packages

Languages

License

825i/archive.org_fileripper

Folders and files

Latest commit

History

Repository files navigation

archive.org_fileripper

Use Case

Usage

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages