Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitswap: opportunistic block fetch over /http* #608

Open
lidel opened this issue May 6, 2024 · 0 comments
Open

bitswap: opportunistic block fetch over /http* #608

lidel opened this issue May 6, 2024 · 0 comments
Labels
need/triage Needs initial labeling and prioritization topic/bitswap topic/gateway Issues related to HTTP Gateway

Comments

@lidel
Copy link
Member

lidel commented May 6, 2024

Posting this early for feasibility feedback. Main use for this would be ipfs/rainbow#125 and fetching from big providers that currently pay high price for supporting bitswap.

TLDR

Instead of making HTTP retrieval an alternative to bitswap sessions, try a simpler approach that makes both systems improve each other in a backward-compatible way instead.

This is a proposal to enhance Bitswap system with ability to do HTTP retrieval of blocks over HTTP instead of libp2p. The opportunistic HTTP retrieval would happen when a peer announces support for it via /http* multiaddr.

Why

Serving data over HTTP is less expensive for multiple reasons (caching, billing), and if IPFS nodes were able to leverage HTTP retrieval ("bitswap over HTTP") instead of "bitswap over libp2p", that would be net positive for the ecosystem.

On the HTTP side:

On the libp2p side:

  • we have a swarm that speaks bitswap 1.2.0 already, and peers are able to ask if peer has data over bitswap, without retrieving it
  • /http* multiaddrs are ignored by existing clients, and shoudl be future-proof thanks to libp2p HTTP spec (spec itself being backward-compatible with plain HTTP trustless gateways being mounted on /ipfs).

Together, we have building blocks to implement and ship an incremental improvement to the public IPFS swarm.

Why doing this in bitswap client?

  • Evolution over revolution.

    • Bitswap is how IPFS swarms exchange data. Asynchronous fetch of a block over HTTP is a very simple protocol that maps closely to existing bitswap abstractions
    • Surgical improvement: we don't have to refactor how content retrieval system works on the high level, limiting the risk of unexpected bugs / regressions, including counter-intuitive ones (example)
  • Ease of adoption

    • If we have HTTP retrieval support in bitswap client, and enabled by default in Kubo and Rainbow, that does not require storage providers to invest a lot to benefit from it. All they need to do, is to announce additional multiaddr to rollout HTTP retrieval endpoints in backward-and-future-compatible way.
    • Reusing bitswap and multiaddrs maximizes interop, enables opportunistic HTTP retrieval everywhere, including private swarms that don't use DHT nor IPNI
  • Ease of implementation

    • I don't feel we can afford to rewrite block exchange and routing systems right now, nor that it could happen in reasonable time without risk of regressions.

How

Details TBD, posting this to gather feasibility feedback, broad strokes idea is:

  • I strongly believe this should be done client-side (client making decision to use HTTP) and not require change in behavior of existing bitswap servers or protocols.
  • High level mechanics:
    • When bitswap client receives "i have" response, don't ask for block over libp2p, but instead fetch it over HTTP from URL based on the /http* multiaddr (bitswap over HTTP), and notify just like when it would be fetched over libp2p.
    • (TBD, feels like good way to minimize risk even further) If HTTP retrieval fromt the peer fails, retry with regular bitswap over libp2p
  • Add configuration option to boxo/bitswap/options.go:
    func OportunisticHTTPRetrieval(enabled bool) Option {
      return Option{client.WithOportunisticHTTPRetrieval(enabled)}
    }
    • Initially it would be disabled by default, and we would enable it only in rainbow. Once we are happy with it, we would make it enabled by default in boxo/bitswap, and other GO-based implementations like Kubo would get it as well.

Integration gotchas

Chicken and the egg problem: we don't have bitswap peersids that also announce /http* multiaddrs.

How would ecosystem rollout "bitswap with oportunistic HTTP retrieval"?

  • DHT: storage providers running IPFS Cluster / Kubo could announce HTTP retrieval capability via opt-in configuration options
    • Existing AppendAnnounce could be where providers add /http* multiaddr pointing at their trustless gateway, and bitswap clients could pick this up via peer routing. This is one-line config change providers do to signal they prefer clients to use HTTP retrieval
  • IPNI:
    • main case is web3.storage, which announces with two different peerids for bitswap and http:
      • bitswap peer QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC listening at /dns4/elastic.dag.house/tcp/443/wss
      • fake peerid QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp which has multiaddr /dns4/dag.w3s.link/tcp/443/https
      • Perhaps this is an opportunity to clean this up and remove the need for bogus HTTp peerids? At minimum, we could ask web3.storage to start announcing /dns4/dag.w3s.link/tcp/443/https under bitswap peer QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC, making their announcements compatible with this proposal.

Feedback welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization topic/bitswap topic/gateway Issues related to HTTP Gateway
Projects
None yet
Development

No branches or pull requests

1 participant