feat: add sessions to trustless gateways #459

achingbrain · 2024-03-01T14:58:10Z

Implements blockstore sessions for trustless gateways.

Queries the Helia routing for block providers
Creates a set of trustless gateways from routing results
Uses only these gateways to fetch session blocks

There are no implementations yet but the usage pattern will be something like: ```javascript // unixfs cat command export async function * cat (cid: CID, blockstore: Blocks, options: Partial<CatOptions> = {}): AsyncIterable<Uint8Array> { // create a session for the CID if support is available const blocks = await (blockstore.createSession != null ? blockstore.createSession(cid) : blockstore) const opts: CatOptions = mergeOptions(defaultOptions, options) // resolve and export using the session, if created, otherwise fall back to regular blockstore access const resolved = await resolve(cid, opts.path, blocks, opts) const result = await exporter(resolved.cid, blocks, opts) if (result.type !== 'file' && result.type !== 'raw') { throw new NotAFileError() } if (result.content == null) { throw new NoContentError() } yield * result.content(opts) } ```

…ifferent metadata

Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>

…interface

Implements blockstore sessions for trustless gateways. - Queries the Helia routing for block providers - Creates a set of trustless gateways from routing results - Uses only these gateways to fetch session blocks

lidel · 2024-03-01T18:47:33Z

packages/block-brokers/src/trustless-gateway/broker.ts

+      for await (const provider of this.routing.findProviders(root, options)) {
+        if (provider.protocols == null || !provider.protocols.includes('transport-ipfs-gateway-http')) {
+          continue


💭 I worry that as things are today, filtering by transport-ipfs-gateway-http will be harmful for decentralization in the long term.

AFAIK it is something invented by IPNI at cid.contact for Rhea/Saturn last year to allow nft.storage to avoid bitswap bills. It only works with IPNI PUTs and was not designed with p2p in mind, Amino DHT peers who in the future will announce /https multiaddrs with trustless gateway as alternative to bitswap will not be discoverable this way.

@achingbrain It may be more future-proof to look at all results, and keep ones that have Addrs with multiaddr that has /https or /tls/http.
This will be compatible with https://github.com/libp2p/specs/blob/master/http/transport-component.md and will also filter-out useless gateways without TLS set up (can't use them in browser due to mixed-content error).

If we look at Addrs instead, when IPNI+DHT proxy at https://delegated-ipfs.dev/routing/v1/providers/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi starts returning DHT results for Kubo nodes with Schema: peer and Addrs that have /https multiaddr, it will "just work", and helia will be able to act on HTTP transport with these peers without having to change anything.

This is quite sad-face-making. I hadn't realised that transport-ipfs-gateway-http was added to the spec due to a commercial concern for nft.storage rather than being something you can rely on.

when IPNI+DHT proxy starts returning DHT results with /https multiaddr, it will "just work"

As I read the above, the HTTP transport being added to Kubo is what will cause HTTP multiaddrs to appear. Unless I'm missing something it doesn't mean the peer is running a path or subdomain gateway.

That is, the remote could have a HTTP transport, but also have the gateway config disabled, just the presence of the transport isn't enough to say one way or the other.

It seems like the only way is to make a request and see what you get back? This could be a recipe for DDoSing if you created nodes that respond to provider queries for certain CIDs?

I think a possible solution here might be for the peer to indicate it's capabilities in it's signed peer record? The HTTP routers could return the peer record in the GET /providers response?

I mean FWIW, if it is the case that it was added exclusively for nft.storage then I wasn't aware of that.

As far as I knew the trustless gateway was already a concept, and we implemented it for Rhea when we were asked. I think there was probably a cost saving for us there due to where we were hosting our bitswap peer at the time, but that's not the case now, and shouldn't have been a convincing argument then since infra costs and locations can easily be changed.

@alanshaw HTTP is good! Trustless gateways as HTTP transport are good, and .storage implementing and exposing it as alternative to bitswap is great for ecosystem, especially our browser work (ipfs-chromium, service worker, verified-fetch, Brave etc). We do want to keep it! ❤️

The only problem is that IPNI implementation informed how HTTP-only provider is represented on /routing/v1, and unfortunately, DHT peers were not included in the design.

The transport-ipfs-gateway-http codec was registered, afaik, only because IPNI already used codes for bitswap and graphsync and needed another one internally (inferring that from discussion at multiformats/multicodec#321 which I missed an arrived after it was merged already :(). It was a provisional solution, so there was no spec, no design doc how it is supposed to work outside IPNI. So we ended up with something that works for cid.contact and ignores the rest of public IPFS swarm.

Since then, I've been trying to find a way for /routing/v1 to represent Amino DHT peers that provide trustless gateway endpoint, ensuring these can be found (once we have them), not just IPNI ones.

@achingbrain yes, looking for /tls/http or /https and sending a probe and marking endpoint as gateway or not is the only reliable way I see right now.
A basic probe could be GET for small, inlined CID: send GET /ipfs/bafkqablimvwgy3y?format=raw to HTTP port and expect hello. Fast and reliable, works with legacy and future gateways too.

FYSA there are proposals of more declarative ways of discovering if HTTP port supports IPFS Gateway functionality, but they are no better than the above HTTP probe.
Things I am aware of:

libp2p/specs#508 proposes probing for manifest at .well-known/libp2p, but this is not implemented anywhere and in practice, no better test than GET /ipfs/bafkqablimvwgy3y?format=raw.

ipfs/specs#425 proposes returning headers on OPTIONS response, but it won't work on older gateways, and no better than GET /ipfs/bafkqablimvwgy3y?format=raw

There were discussion about publishing protocol list from libp2p identify with the peer record o Amino DHT, and then exposing this on /routing/v1 responses (but this requires network upgrade and time).

My point is that if you send me a request, and I send you a response knowing it will trigger you to send a request to a specific third party, that is an attack vector.

If I send you a single response with 10x provider records, and you make 10x requests, that is a vector for an amplification attack because I have got you to do more work than the work it took me to get you to do the work, though hopefully it's not all bad because the provider hosts should have to be distinct.

As long as we recognise this and are ok with it.

Thats a great catch @achingbrain

We should absolutely have some way to know the returned providers are valid ipfs node.

I guess as long as we dont modify the delegated routers from a set of known good actors it shouldnt be an issue, but thats centralizing on known delegated providers.

I imagine we would want any peer to be able to provide routing responses in the future though (ambient peer discovery?) where this could become a huge problem. I believe the amplification attack was discussed there as well

If we get a list of providers from some delegated-router and all those providers turn out bad, we could at least mark that delegated-router as a bad actor and limit any future requests to it

My point is that if you send me a request, and I send you a response knowing it will trigger you to send a request to a specific third party, that is an attack vector.

True, although to some extent this is true regardless. There is currently no validation anywhere in use (e.g. DHT, IPNI, etc.) that a peer advertising /ip4/1.2.3.4/tcp/4001/p2p/<peerID> actually controls that IP address so there's already some potential vector there. This could be done by trying each advertised address, although this assumes the routing system supports all the protocols advertised (e.g. ipv4/ipv6, tcp/quic/webtransport/wss/...) which might not be true (or great for upgradability).

To be fair it is likely more work to setup HTTPS for a bad request than trying to establish just a libp2p connection and on the libp2p side we can be somewhat protected by the peerIDs/public keys (e.g. the public key can sign what's a valid advertisement with its key which will cause the security negotiation to fail otherwise ... although that will result in a wasted dial which is a separate issue in and of itself).

While we're here there's a slightly larger attack surface here for "webseeds-like" behavior (e.g. pointing at https://some-ubuntu-mirror.com/ubuntu-v.1.2.3.iso or (.car)) since more bytes can be transferred. This problem gets drastically worse if the incremental verifiability window gets larger.

If we get a list of providers from some delegated-router and all those providers turn out bad, we could at least mark that delegated-router as a bad actor and limit any future requests to it

Hopefully as explained above this isn't a reasonable thing to do unless there's actually responsibility on the underlying routing system to do these checks (or some delegated routers are allowed to advertise via something like HTTP OPTIONS or .well-known to do extra work you can punish them for not doing correctly). Mostly what you're left with to punish them on is do they give about the same results as other delegated-routers (for the same underlying routing systems) or as querying the underlying systems themselves (likely requires something similar to ipfs/specs#388).

filtering by transport-ipfs-gateway-http will be harmful for decentralization in the long term.

So this is true, just like filtering for bitswap is also harmful. These protocol advertisements are only supported by IPNI, for Amino (as well as any new protocol advertised by nodes without a new codec landing and all IPNI nodes supporting the code) using identify-like behavior (e.g. libp2p identify or leveraging .well-known) is required.

Since then, I've been trying to find a way for /routing/v1 to represent Amino DHT peers that provide trustless gateway endpoint, ensuring these can be found (once we have them), not just IPNI ones.

For trustless-gateway over libp2p .well-known can be used and it's fine. For plain trustless-gateway there's currently no way to advertise that in the Amino DHT because the mappings are multihash -> peerID. This is very similar (although the underlying use case wasn't exactly the same) as the problem expressed here libp2p/notes#11 and ipld/ipld#57 (comment) (note: that's from 2019 and my understanding of the situation has improved a bit since then 🙃 which is part of why there's more detail in the message above than there). AFAIK this would require a protocol change to support with Amino.

SgtPooki

this looks good to me, but based on prior comments it needs a few changes before merging

SgtPooki · 2024-03-01T18:39:47Z

packages/block-brokers/src/trustless-gateway/broker.ts

+   * @default only-if-cached
+   */
+  cacheControl?: string


I'm worried about defaulting to asking gateways to only return content they have. will read more in the PR

This is to stop them doing what gateways do, e.g. fetch content on your behalf. Otherwise it defeats the purpose of having a session.

packages/block-brokers/src/trustless-gateway/trustless-gateway.ts

…teway-sessions

achingbrain and others added 14 commits February 6, 2024 08:58

chore: pr comments

5836dcd

chore: linting

3d9cb49

chore: simplify options

39b6f7a

chore: createSession is not optional on blocks

d1228b9

chore: update utils

32713ab

chore: create block brokers last to access components

0c28d66

chore: do not dedupe at the routing level as different impls return d…

d0610c9

…ifferent metadata

chore: add defaults to interface

9aeded1

chore: remove unused dep

8e6051a

chore: update constants

ca1c458

chore: apply suggestions from code review

d7d6334

Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into feat/add-sessions-to-…

8c42ec5

…interface

feat: add sessions to trustless gateways

8997d41

Implements blockstore sessions for trustless gateways. - Queries the Helia routing for block providers - Creates a set of trustless gateways from routing results - Uses only these gateways to fetch session blocks

achingbrain requested a review from a team as a code owner March 1, 2024 14:58

chore: close server

60365d2

lidel reviewed Mar 1, 2024

View reviewed changes

achingbrain mentioned this pull request Mar 4, 2024

feat: add explicit support for subdomain gateways #439

Open

3 tasks

SgtPooki approved these changes Mar 8, 2024

View reviewed changes

chore: filter by http address

ac06ff6

Base automatically changed from feat/add-sessions-to-interface to main April 4, 2024 10:27

Merge remote-tracking branch 'origin/main' into feat/add-trustless-ga…

263bc1c

…teway-sessions

achingbrain merged commit 6ddefb0 into main Apr 4, 2024
18 checks passed

achingbrain deleted the feat/add-trustless-gateway-sessions branch April 4, 2024 10:54

achingbrain mentioned this pull request Apr 4, 2024

chore: release main #483

Merged

SgtPooki mentioned this pull request Apr 9, 2024

docs: fixing changelogs SgtPooki/helia#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add sessions to trustless gateways #459

feat: add sessions to trustless gateways #459

achingbrain commented Mar 1, 2024

lidel Mar 1, 2024 •

edited

Loading

achingbrain Mar 4, 2024 •

edited

Loading

alanshaw Mar 4, 2024

lidel Mar 7, 2024 •

edited

Loading

achingbrain Mar 8, 2024

SgtPooki Mar 8, 2024 •

edited

Loading

aschmahmann Mar 14, 2024

aschmahmann Mar 14, 2024

SgtPooki left a comment

SgtPooki Mar 1, 2024

achingbrain Mar 14, 2024

feat: add sessions to trustless gateways #459

feat: add sessions to trustless gateways #459

Conversation

achingbrain commented Mar 1, 2024

lidel Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

achingbrain Mar 4, 2024 • edited Loading

Choose a reason for hiding this comment

alanshaw Mar 4, 2024

Choose a reason for hiding this comment

lidel Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

achingbrain Mar 8, 2024

Choose a reason for hiding this comment

SgtPooki Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

aschmahmann Mar 14, 2024

Choose a reason for hiding this comment

aschmahmann Mar 14, 2024

Choose a reason for hiding this comment

SgtPooki left a comment

Choose a reason for hiding this comment

SgtPooki Mar 1, 2024

Choose a reason for hiding this comment

achingbrain Mar 14, 2024

Choose a reason for hiding this comment

lidel Mar 1, 2024 •

edited

Loading

achingbrain Mar 4, 2024 •

edited

Loading

lidel Mar 7, 2024 •

edited

Loading

SgtPooki Mar 8, 2024 •

edited

Loading