Scripts that download DNC, Podesta, and Clinton emails from Wikileaks into their original format so they can be loaded into an email client for further perusal.
dncdownload.sh
- The original version, written for bash. Only supports DNC.WikileaksEmailDownloader.py
- The second version, written for Python3. Supports DNC + Podesta.
This repository contains pregenerated metalink files for each set of emails. Use aria2 to download them.
Emails will be written in their respective {dnc,podesta,clinton}-emails
subdirectory. DNC and Podesta emails have their 0-padded ID prefixed to the file name as some have duplicate names.
$ aria2c \
--save-session=dnc.session.aria2 \
--save-session-interval=10 \
--continue=true \
--max-concurrent-downloads=50 \
--max-tries=0 \
--retry-wait=5 \
--allow-overwrite=true \
--always-resume=true \
--auto-file-renaming=false \
dnc-emails.metalink # or podesta-emails.metalink or clinton-emails.metalink
$ aria2c \
--save-session=dnc.session.aria2 \
--save-session-interval=10 \
--continue=true \
--max-concurrent-downloads=50 \
--max-tries=0 \
--retry-wait=5 \
--allow-overwrite=true \
--always-resume=true \
--auto-file-renaming=false \
-i dnc.session.aria2
Use metagen.py <dnc|podesta|clinton>
. This requires wikileaks.db
to be completed. A compressed version
is provided in the repository. See wikileaks.db.zst
. If you'd like to generate from scratch, continue reading.
This is a bit of a painful process:
-
Create the database.
$ sqlite3 wikileaks.db < schema.db
-
Scrape the email metadata (filenames, etc.)
$ ./urlscrape.mt2.py
This will take awhile. Wikileaks likes to 503/504 a lot, so be patient. If interrupted, this will pick up where it left off.
-
Generate "stage 1" metalinks.
./metagen.py dnc > dnc.stage1.metalink ./metagen.py podesta > podestea.stage1.metalink ./metagen.py clinton > clinton.stage1.metalink
These are metalink files without file sizes or hashes, only URLs and names.
-
Download the files. This is the most fragile part as there's nothing to verify against.
$ aria2c \ --save-session=dnc.session.aria2 \ --save-session-interval=10 \ --continue=true \ --max-concurrent-downloads=50 \ --max-tries=0 \ --retry-wait=5 \ --allow-overwrite=true \ --always-resume=true \ --auto-file-renaming=false \ dnc.stage1.metalink # also do podesta and clinton
-
Hash the downloaded files
$ ./hash-files.py
wikileaks.db
should now contain all the information required to generate the
completed metalink files.