-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mirroring Web to IPFS #94
Comments
It might be interesting to talk to https://archive.fo/ and https://archive.org who might have already written something very similar. |
Sure, I'd be happy to talk. - dweb.archive.org doesn't do it for web pages (yet) but does mirror some of the content accessed through dweb-gateway to the IPFS http api. (Not all of it, because of the combination of IPFS losing data, and no error result/fallback when it cant find something). Note that we also use urlstore as our primary mirroring mechanism, because we have the opposite concern to you, i.e. that we can't replicate 50 peta-bytes, so just push the reference so that the most used items will get mirrored by IPFS, and an upcoming version will also pull items via IPFS as alternative to a direct fetch from the archive. I also wrote dweb.mirror which is a crawler, specialized to crawl archive.org items (not wayback machine yet) and that mirrors everything to IPFS. |
I'll be going to csv,conf next week. It will be another chance to talk more with @ikreymer, who is giving a talk on WARC files: https://csvconf.com/speakers/#ilya-kreymer |
How about asking archive.org if we could help them by cooperating, I'm sure they have issues with crawling capacity? Archive.org could provide data in ipfs when a given URL has been captured. If this is some days ago, we could ask the user, if he likes to capture the URL (since he might be logged in or personal information is currently inserted in a form or similar). If he agrees we share the snapshot in IPFS (somehow - I have no idea how this would technically work to make it locatable by URL and timestamp). archive.org could pin it or download it, for displaying it on their website. |
Hi, I've just recently launched https://replayweb.page/ (https://github.com/webrecorder/replayweb.page) which is a full browser-based web archive replay system ('wayback machine'), using service workers. The system can load web archives from a variety of locations, and could be expanded to support IPFS. In fact, it can trivially work using an IPFS gateway already: It should be possible to extend to support ReplayWeb.page is the latest tool from Webrecorder, here's also a blog post announcing it: |
Relevant demo/status update of @ikreymer's work: https://www.youtube.com/watch?v=evcSETnTBf0 |
This proposal touches this topic: https://discuss.ipfs.io/t/ipfs-records-for-urn-uri-resolving-via-a-dht/10456/4 |
This is a meta-issue tracking related work and discussions (moved from ipfs/ipfs-companion#96).
Feasible
More Design Work Required
Saving reproducible snapshot of entire page load
Automatic mirroring of standard websites to IPFS as you browse them (ipfs/ipfs-companion#535)
IMMUTABLE assets: very limited feasibility, so far only two types of immutable resources on the web exist:
Cache-Control: public, (..) immutable
(mapping URL→CID)MUTABLE assets: what if we we add every page to IPFS store mapping between URL and CID, then if page disappear, we could fallback to IPFS version?
Other notes
- webpackage: Save and share a web page (Use Case)
Prior art: existing browser extensions
Related Discussions
2016-03-26
IRC log about mirroring SRI2IPFS
2015+
2018-01-14
2018-03-08
2018-07-09
2018-07-23
The text was updated successfully, but these errors were encountered: