Tool for archiving and exploring.
Built out of a need to get out of walled gardens of Pinterest and (much less walled) Pinboard.
Alpha quality at best.
Archivist is built out of three interconnected parts (each package has it's own readme file):
archivist-cli
- command line tool for configuration, fetching and querying the dataarchivist-ui
- Electron UI built on top ofarchivist-cli
archivist-*
- various crawlers, "official" ones:archivist-pinboard
- API-based Pinboard archiving: screenshot and freeze-dry of the original websitearchivist-pinterest-crawl
- slowly crawl through Pinterest and archive pin image
npm install -g archivist-cli
- to archive pinboard:
npm install -g archivist-pinboard
- to archive pinterest:
npm install -g archivist-pinterest-crawl
archivist-ui
is not on npm (it should probably be a downloadable dmg
, but I didn't get around to it), so to generate the .app
and put it in /Applications/
yourself:
- clone this repo
cd archivist-ui && ./scripts/install.sh
$ archivist config
Config is a JSON object of shape:
{
"crawler-1": CRAWLER_1_OPTIONS,
"crawler-2": CRAWLER_2_OPTIONS,
...
}
Example config (assuming Pinboard and Pinterest backup):
{
"archivist-pinterest-crawl": {
"loginMethod": "cookies",
"profile": "szymon_k"
},
"archivist-pinboard": {
"apiKey": "API_KEY_FOR_PINBOARD"
}
}
archivist-pinterest-crawl
supports two login methods: "cookies"
(which uses cookies from local Google Chrome installation) or "password"
which requires plaintext username and password:
"archivist-pinterest-crawl": {
"loginMethod": "password",
"username": "PINTEREST_USERNAME",
"password": "PINTEREST_PASSWORD",
"profile": "szymon_k"
},
archivist-pinboard
requires API Token
from https://pinboard.in/settings/password to run properly.
- backup data:
archivist fetch
(might take a long time depending on the size of the archive) - list everything:
archivist query
- find everything about keyboards:
archivist query keyboard
query
by default returnsndjson
, normal JSON can be outputed using--json
- find everything about keyboards:
- kollektor - no-ui self-hosted Pinterest clone
- gwern on archiving URLs
- freeze-dry implementation notes