-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic mirroring of HTTP websites to IPFS as you browse them #535
Comments
You can use https://hash-archive.org/ |
No, I can not use hash-archive.org as it has no necessary API. Also, the implementation requires close cooperation with the site owner, and currently I don't see hash-archive doing this. I will drop them a note though. |
Api not documented:
|
Sadly the only "static" files are ones with SRI hash or @skliarie Some notes in on this can be found at #96 (comment), re-pasting them here:
To reiterate, mapping mutable content under some URL to IPFS CID requires a centralized index which introduces huge vector for privacy leaks and a single point of failure. Not to mention index will be basically DDoSed if our user base grows too fast, and if we base lookups on HTTP requests that will degrade browsing performance even further. IMO not worth investing time with those known downsides, plus it sends really mixed-message of decentralizing using centralized server. However, I like the fact this idea (mapping URL2IPFS) comes back every few months, which means there is some potential to it. So what is needed to make it "right"?
There are still pubsub performance and privacy problems to solve (eg. publishing banking pages), but at least we don't rely on HTTP server anymore. :) |
The whole http2ipfs ticket is essentially exercise in trust: who you are going to believe to accept HTTP pre-calculated hash from. BTW, same question must arise when choosing initial seed servers to connect to. May be they have a solution that we can reuse.. I think we should adopt self-hosted/trust building hybrid approach. Something along the lines:
ipfs-companion changes:
|
I would like to add couple clarifications to the idea:
The undefined group: The ipfs-companion should have toggle (button?) on whether to treat the group URLs in "undefined" as dynamic or "most likely static". This way users with bad rendering (as result of stale content) might refresh the page "properly". IMHO this should be done on per-site basis, as I doubt there are many such "dumb" sites (that use static format for dynamic results). When this toggle is activated, the URL (or all "maybe static" URLs) of the site are marked as dynamic and published in the corresponding pubsub room(s). URLs in undefined group are not "published" in pubsub rooms, only "dynamic" and "most likely static" ones. For security sensitive sites, there should be toggle to mark all URLs as "dynamic" ones. The toggle might ask for confirmation whether to "disable IPFS caching for the whole https://bankofamerica.com site". |
I think you meant
Statically-generated blogs are a thing, and there are basically "static" websites generated on the fly by PHP or ASP. And a lot of websites hide The risk of breaking websites by over-caching is extremely high and writing custom heuristics is a maintenance hell prone to false-positives. I feel the safe way to do it to just follow semantics of
Agreed, there should always be way to exclude website from mirroring experiment.
I am afraid manual opt-out is not a safe way to do it. Some open problems related to security:
|
Lets clarify the groups according per "shareability" attribute:
After that point our concern is only with content that many (more than 20 users) have access to, but still is sensitive (protected by password) or restricted by network (see below). For that, the "dynamic" toggle button I mentioned above is used. Once clicked, it should prompt user with selectable list of top-level domains seen on the page. The user then selects domains that must be marked as "dynamic". This will open pubsub rooms with hash of domains and publish the domains as "dynamic" and thus never cacheable. Need to think what to do with evil actors that want to disable IPFS cache for foreign (competitor?) site.. Or someone that does not have a clue and selects all of them... With such conservative caching approach, I don't see possible harm done, do you? |
I am a bit skeptical about finding an easy fix that protects users from sharing secrets or being fed invalid content by bad actors that can spawn multiple nodes. Those are complex problems that should not be underestimated. We should play it safe and start with a simple opt-in experiment that has a smaller number of moving pieces and is easier to reason about:
That being said, some potential blockers:
Overall, I am 👍 for this shipping this opt-in experiment with Companion, but it won't happen this quarter (due to the state of pubsub, js-ipfs and other priorities). Of course PRs welcome :) |
The only time-critical moments is to see whether a URL might be retrieved using IPFS. This is simple - hash the URL, lookup "local node directory" whether it has metadata on the URL content hash and has the content. This should be quick enough to be done on the fly, during page load. Is there API to see whether the tab is open (e.g. user awaits for the page to load)? If we can see that it is not, then we can hint IPFS that it can take some more time to retrieve content for "static" URLs. In the same vein, URLs referred in the page, could be pro-actively "ipfs-tested" in background. May be even pre-fetched. Regarding WebExtension API limitations, I see two solutions:
BTW, on HTTPS site, loading images from HTTP is allowed and will not cause "mixed content" warning |
I wrote a proposal which would make this possible. It would allow users to archive websites on their nodes and sign them with different methods. Other users would be able to find them and select an entity which they trust - for example, the internet-archive. https://discuss.ipfs.io/t/ipfs-records-for-urn-uri-resolving-via-a-dht/10456/4 |
By design IPFS provides excellent caching and accelerating mechanism. It would be nice to use IPFS as HTTP accelerator. Obviously, it might be used for static files only.
My proposal is as follows:
1.1 Have api to get URL and return IPFS hash, expiration time
1.2 Have "workers" to retrieve unknown or expired URLs, add it to IPFS and keep hash in internal db.
2.1 For each never-seen-before URL do this (with 5s timeout):
2.1.1 Go to the "IPFS2HTTP" site, pass the URL, see if it has hash for the URL. Get the hash, fetch it over IPFS, return to user. Fall back to HTTP if no such hash.
2.2. If URL already seen - check its time to live (as received from IPFS2HTTP site) and use IPFS hash if possible. Otherwise retrieve new hash as in 2.1.1
Initially we might use the acceleration for obviously static files, such as .iso, .mp3, images...
I volunteer to do the part 1. Can you help with 2?
What do you think?
The text was updated successfully, but these errors were encountered: