Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirroring Web to IPFS #96

Closed
2 of 3 tasks
lidel opened this issue Mar 26, 2016 · 4 comments
Closed
2 of 3 tasks

Mirroring Web to IPFS #96

lidel opened this issue Mar 26, 2016 · 4 comments
Labels
help wanted Seeking public contribution on this issue kind/discussion Topical discussion; usually not changes to codebase kind/enhancement A net-new feature or improvement to an existing feature status/ready Ready to be worked

Comments

@lidel
Copy link
Member

lidel commented Mar 26, 2016

Meta-issue tracking related work and discussions got moved to ipfs/in-web-browsers#94

Click to expand historical notes before the move

Ready to Implement

  • Integrate js-ipfs library to handle multipart upload to API
    • there are issues with browserified version that need to be resolved first: missing os module, and when all shims are enabled global.XMLHttpRequest is missing
  • Image Rehosting via HTTP API (Image Rehosting via HTTP API #59)
  • Save whole page to IPFS (creating a one-time shareable mirror/snapshot) (Save entire Web page to IPFS #91)

More Design Work Required

  • Automatic mirroring of standard websites to IPFS as you browse them
    • IMMUTABLE assets: very limited feasibility, so far only two types of immutable resources on the web exist:
      • JS, CSS etc marked with SRI hash (Subresource Integrity) (mapping SRI→CID) (see discussion from 2016-03-26 below)
      • URLs for things explicitly marked as immutable via Cache-Control: public, (..) immutable (mapping URL→CID)
    • MUTABLE assets: what if we we add every page to IPFS store mapping between URL and CID, then if page disappear, we could fallback to IPFS version?
      • a can of worms: a safe version would be like web.archive.org, but limited to a local machine. Sharing cache with other people would require centralized mapping service (single point of failure, vector for privacy leaks)
      • So what is needed to make it "right"?
        • keep it simple but robust: no http, no centralization, no single point of failure
        • Ideally, URL2IPFS lookups would not rely on centralized index.
          • rough idea (Automatic mirroring of HTTP websites to IPFS as you browse them #535 (comment)): what if we create pubsub-based room per URL? for example:
            • When you open a website, you subscribe to pubsub room unique for that URL
            • If pubsub room has entries under "keepalive" treshold, grab the latest one
            • If room is empty or keepalive timeout is hit, fallback to HTTP, but in background add HTTP page to IPFS and announce updated hash on pubsub (with new timestamp) for next visitor
            • There are still pubsub performance and privacy problems to solve (eg. publishing banking pages), but at least we don't rely on HTTP server anymore.
    • Other notes

Related Discussions

2016-03-26

IRC log about mirroring SRI2IPFS
165958           geir_ │ lgierth: The web sites would have to link to ipfs content for this plugin to work. What i propose is a proxy that works like a transparent proxy and puts content into ipfs if it's not already there
170124            ed_t │ anyone know anything about ipfs-boards
170141            ed_t │ it keeps telling me I am in limited mode
170202            ed_t │ a full ipfs 0.40-rc3 node is running on localhost:5001
170217            ed_t │ but it does not seem to see it using the demo link
170228        +lgierth │ geir_: ah got what you wanna do -- i'm not sure you can easily just rewrite anything
170253        +lgierth │ for completely static pages, yes, but for slightly more dynamic stuff?
170303        +lgierth │ i'll be back in a bit, getting some coffee
170422           geir_ │ lgierth: I mean only for the static stuff like images, libs and so on. Should be pretty strait forward to implement. And a big bandwidth save for big networks
171542           lidel │ geir_, we are planning to add "host to ipfs" feature to the addon
171614           lidel │ when that is done, it should be easy to add option to automatically add every visited page
171634           lidel │ not sure how addon would do lookups tho
171734           lidel │ (meaning, how do i know the multihash of the page, how do we handle ipfs-cache expiration when page gets updated, etc)
171831           geir_ │ lidel: I see, thanks for the info. I still like the idea of a transparent proxy so every user/device on the network will use the "cdn" automatically
171852           lidel │ perhaps we could start with mirroring static assets that have SRI hash (https://www.srihash.org/)
171920           lidel │ and come up with a way for doing SRI2IPFS lookups

2018-01-14

2018-03-08

2018-07-09

2018-07-23

@lidel lidel added kind/enhancement A net-new feature or improvement to an existing feature kind/discussion Topical discussion; usually not changes to codebase help wanted Seeking public contribution on this issue labels Mar 26, 2016
@lidel lidel added status/blocked/missing-api Blocked by missing API and removed help wanted Seeking public contribution on this issue labels Aug 3, 2016
@lidel lidel removed the status/blocked/missing-api Blocked by missing API label Oct 2, 2017
@lidel lidel added the help wanted Seeking public contribution on this issue label Jan 14, 2018
@lidel lidel added the status/ready Ready to be worked label Mar 7, 2018
@victorb
Copy link
Member

victorb commented Mar 16, 2018

I don't think it's necessary to automatically mirror any scripts with ipfs-companion.

Instead, we can have a script that checks all script tags with a integrity attribute from Alexa Top 1000 or something like that, add those to IPFS and map their hashes to IPFS hashes and ship that index with ipfs-companion.

That way, when ipfs-companion hits a resource with integrity tag, it has a big chance of already being available on IPFS.

@timthelion
Copy link

I'm not really sure if I fully understand this idea. Are you trying to automatically cache web-pages using IPFS? Or to fall back to IPFS when a page is down?

I came across this issue, because I want to be able to "print a webpage to IPFS". I do a lot of visual programming research and most things I find are on short lived personal sites. When I then cite or link to these articles, I find that a few years later, the web page is gone. I'd like to be able to click a button like "print to IPFS" which would generate an IPFS link that I could cite or something like that. Not automatic, but manual.

@lidel
Copy link
Member Author

lidel commented Jul 2, 2018

@timthelion automatic mirroring is a hard problem (for reasons noted in #96 (comment)), but creating shareable snapshots of webpages could be implemented as on-demand action.

Actually, "print a webpage to IPFS" is something we want to add to browser extension. If you have ideas on how it should work, check/comment on initial vision in #91 (comment).

@lidel
Copy link
Member Author

lidel commented Jul 24, 2018

FYSA meta-issue tracking related work and discussions got moved to ipfs/in-web-browsers#94

@lidel lidel closed this as completed Jul 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Seeking public contribution on this issue kind/discussion Topical discussion; usually not changes to codebase kind/enhancement A net-new feature or improvement to an existing feature status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

3 participants