-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add sessions to trustless gateways #459
Conversation
There are no implementations yet but the usage pattern will be something like: ```javascript // unixfs cat command export async function * cat (cid: CID, blockstore: Blocks, options: Partial<CatOptions> = {}): AsyncIterable<Uint8Array> { // create a session for the CID if support is available const blocks = await (blockstore.createSession != null ? blockstore.createSession(cid) : blockstore) const opts: CatOptions = mergeOptions(defaultOptions, options) // resolve and export using the session, if created, otherwise fall back to regular blockstore access const resolved = await resolve(cid, opts.path, blocks, opts) const result = await exporter(resolved.cid, blocks, opts) if (result.type !== 'file' && result.type !== 'raw') { throw new NotAFileError() } if (result.content == null) { throw new NoContentError() } yield * result.content(opts) } ```
…ifferent metadata
Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>
Implements blockstore sessions for trustless gateways. - Queries the Helia routing for block providers - Creates a set of trustless gateways from routing results - Uses only these gateways to fetch session blocks
for await (const provider of this.routing.findProviders(root, options)) { | ||
if (provider.protocols == null || !provider.protocols.includes('transport-ipfs-gateway-http')) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I worry that as things are today, filtering by transport-ipfs-gateway-http
will be harmful for decentralization in the long term.
AFAIK it is something invented by IPNI at cid.contact for Rhea/Saturn last year to allow nft.storage to avoid bitswap bills. It only works with IPNI PUTs and was not designed with p2p in mind, Amino DHT peers who in the future will announce /https
multiaddrs with trustless gateway as alternative to bitswap will not be discoverable this way.
@achingbrain It may be more future-proof to look at all results, and keep ones that have Addrs
with multiaddr that has /https
or /tls/http
.
This will be compatible with https://github.com/libp2p/specs/blob/master/http/transport-component.md and will also filter-out useless gateways without TLS set up (can't use them in browser due to mixed-content error).
If we look at Addrs
instead, when IPNI+DHT proxy at https://delegated-ipfs.dev/routing/v1/providers/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi starts returning DHT results for Kubo nodes with Schema: peer
and Addrs
that have /https
multiaddr, it will "just work", and helia will be able to act on HTTP transport with these peers without having to change anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite sad-face-making. I hadn't realised that transport-ipfs-gateway-http
was added to the spec due to a commercial concern for nft.storage
rather than being something you can rely on.
when IPNI+DHT proxy starts returning DHT results with /https multiaddr, it will "just work"
As I read the above, the HTTP transport being added to Kubo is what will cause HTTP multiaddrs to appear. Unless I'm missing something it doesn't mean the peer is running a path or subdomain gateway.
That is, the remote could have a HTTP transport, but also have the gateway config disabled, just the presence of the transport isn't enough to say one way or the other.
It seems like the only way is to make a request and see what you get back? This could be a recipe for DDoSing if you created nodes that respond to provider queries for certain CIDs?
I think a possible solution here might be for the peer to indicate it's capabilities in it's signed peer record? The HTTP routers could return the peer record in the GET /providers
response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean FWIW, if it is the case that it was added exclusively for nft.storage then I wasn't aware of that.
As far as I knew the trustless gateway was already a concept, and we implemented it for Rhea when we were asked. I think there was probably a cost saving for us there due to where we were hosting our bitswap peer at the time, but that's not the case now, and shouldn't have been a convincing argument then since infra costs and locations can easily be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanshaw HTTP is good! Trustless gateways as HTTP transport are good, and .storage implementing and exposing it as alternative to bitswap is great for ecosystem, especially our browser work (ipfs-chromium, service worker, verified-fetch, Brave etc). We do want to keep it! ❤️
The only problem is that IPNI implementation informed how HTTP-only provider is represented on /routing/v1
, and unfortunately, DHT peers were not included in the design.
The transport-ipfs-gateway-http
codec was registered, afaik, only because IPNI already used codes for bitswap and graphsync and needed another one internally (inferring that from discussion at multiformats/multicodec#321 which I missed an arrived after it was merged already :(). It was a provisional solution, so there was no spec, no design doc how it is supposed to work outside IPNI. So we ended up with something that works for cid.contact and ignores the rest of public IPFS swarm.
Since then, I've been trying to find a way for /routing/v1
to represent Amino DHT peers that provide trustless gateway endpoint, ensuring these can be found (once we have them), not just IPNI ones.
@achingbrain yes, looking for /tls/http
or /https
and sending a probe and marking endpoint as gateway or not is the only reliable way I see right now.
A basic probe could be GET for small, inlined CID: send GET /ipfs/bafkqablimvwgy3y?format=raw
to HTTP port and expect hello
. Fast and reliable, works with legacy and future gateways too.
FYSA there are proposals of more declarative ways of discovering if HTTP port supports IPFS Gateway functionality, but they are no better than the above HTTP probe.
Things I am aware of:
- libp2p/specs#508 proposes probing for manifest at
.well-known/libp2p
, but this is not implemented anywhere and in practice, no better test thanGET /ipfs/bafkqablimvwgy3y?format=raw
. - ipfs/specs#425 proposes returning headers on
OPTIONS
response, but it won't work on older gateways, and no better thanGET /ipfs/bafkqablimvwgy3y?format=raw
- There were discussion about publishing protocol list from libp2p identify with the peer record o Amino DHT, and then exposing this on /routing/v1 responses (but this requires network upgrade and time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that if you send me a request, and I send you a response knowing it will trigger you to send a request to a specific third party, that is an attack vector.
If I send you a single response with 10x provider records, and you make 10x requests, that is a vector for an amplification attack because I have got you to do more work than the work it took me to get you to do the work, though hopefully it's not all bad because the provider hosts should have to be distinct.
As long as we recognise this and are ok with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a great catch @achingbrain
We should absolutely have some way to know the returned providers are valid ipfs node.
I guess as long as we dont modify the delegated routers from a set of known good actors it shouldnt be an issue, but thats centralizing on known delegated providers.
I imagine we would want any peer to be able to provide routing responses in the future though (ambient peer discovery?) where this could become a huge problem. I believe the amplification attack was discussed there as well
If we get a list of providers from some delegated-router and all those providers turn out bad, we could at least mark that delegated-router as a bad actor and limit any future requests to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that if you send me a request, and I send you a response knowing it will trigger you to send a request to a specific third party, that is an attack vector.
True, although to some extent this is true regardless. There is currently no validation anywhere in use (e.g. DHT, IPNI, etc.) that a peer advertising /ip4/1.2.3.4/tcp/4001/p2p/<peerID>
actually controls that IP address so there's already some potential vector there. This could be done by trying each advertised address, although this assumes the routing system supports all the protocols advertised (e.g. ipv4/ipv6, tcp/quic/webtransport/wss/...) which might not be true (or great for upgradability).
To be fair it is likely more work to setup HTTPS for a bad request than trying to establish just a libp2p connection and on the libp2p side we can be somewhat protected by the peerIDs/public keys (e.g. the public key can sign what's a valid advertisement with its key which will cause the security negotiation to fail otherwise ... although that will result in a wasted dial which is a separate issue in and of itself).
While we're here there's a slightly larger attack surface here for "webseeds-like" behavior (e.g. pointing at https://some-ubuntu-mirror.com/ubuntu-v.1.2.3.iso
or (.car)) since more bytes can be transferred. This problem gets drastically worse if the incremental verifiability window gets larger.
If we get a list of providers from some delegated-router and all those providers turn out bad, we could at least mark that delegated-router as a bad actor and limit any future requests to it
Hopefully as explained above this isn't a reasonable thing to do unless there's actually responsibility on the underlying routing system to do these checks (or some delegated routers are allowed to advertise via something like HTTP OPTIONS or .well-known to do extra work you can punish them for not doing correctly). Mostly what you're left with to punish them on is do they give about the same results as other delegated-routers (for the same underlying routing systems) or as querying the underlying systems themselves (likely requires something similar to ipfs/specs#388).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filtering by transport-ipfs-gateway-http will be harmful for decentralization in the long term.
So this is true, just like filtering for bitswap
is also harmful. These protocol advertisements are only supported by IPNI, for Amino (as well as any new protocol advertised by nodes without a new codec landing and all IPNI nodes supporting the code) using identify-like behavior (e.g. libp2p identify or leveraging .well-known) is required.
Since then, I've been trying to find a way for /routing/v1 to represent Amino DHT peers that provide trustless gateway endpoint, ensuring these can be found (once we have them), not just IPNI ones.
For trustless-gateway over libp2p .well-known can be used and it's fine. For plain trustless-gateway there's currently no way to advertise that in the Amino DHT because the mappings are multihash -> peerID. This is very similar (although the underlying use case wasn't exactly the same) as the problem expressed here libp2p/notes#11 and ipld/ipld#57 (comment) (note: that's from 2019 and my understanding of the situation has improved a bit since then 🙃 which is part of why there's more detail in the message above than there). AFAIK this would require a protocol change to support with Amino.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me, but based on prior comments it needs a few changes before merging
* @default only-if-cached | ||
*/ | ||
cacheControl?: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about defaulting to asking gateways to only return content they have. will read more in the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to stop them doing what gateways do, e.g. fetch content on your behalf. Otherwise it defeats the purpose of having a session.
Implements blockstore sessions for trustless gateways.