-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subdomain support for CIDs longer than 63 #7318
Comments
It's a bit unfortunate that keys are so overly verbose: https://cid.ipfs.io/#bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk It looks like we have an actual protobuf construct inside the raw bytes. Is this... something we need to do? If we shave off 2 bytes, nothing extra needs to be done... |
I am afraid even if we find a hacky workaround for libp2p-keys in ED25519, the problem remains for CID that use longer hash functions than sha256. |
For context, we're trying to encode 40 bytes into 62 characters (with one character for the multibase prefix). I believe base36 would work, if that's an option. That should give us exactly 63 characters. We could change how we encode these peer IDs in text and use an ed25519 specific codec ( |
But I agree we should support longer keys regardless. But will this be a problem for TLS certs? Can we get a double-star cert? |
|
@aschmahmann and I discussed this and it is possible to shrink ed25519 pids, but it's painful and requires coordination with all libp2p implementations. To shrink ed25519 keys, we need to:
Unfortunately, if we want to get 1 in the near future, we'd make it significantly harder to get 2. Basically, if we start using the new ed25519 pid encoding now, we'd have to convert back to the normal pid binary format (raw multihash) when decoding. However, if/when we decide to use CIDs as the binary pid format, we'd have trouble round-tripping. That is, in the ideal world, if we encounter a text-based PID as a CID:
However, if we implement 1 before 2, we'd have to encode legacy keys in this new CID format. When converting back, we'd end up with the wrong "on the wire" format. |
This seems to not be possible: https://serverfault.com/a/946120 |
^ might have been closed a bit eagerly by github. So am I correct to assume that multi-subdomain is not considered anymore ? That'd be nice as it would be a pain to host with TLS due to the certificate limitation. |
@MichaelMure yeah, github is too eager indeed. Yes, this is precisely why we went with b36 - to keep TLS possible for the time being. |
We've met yesterday and came up with next steps to Notes at: ipfs/team-mgmt#1159 – early feedback / questions appreciated! |
Could you explain what (2) is in more details ? This document mainly discuss IPNS. |
@MichaelMure see ipfs/team-mgmt#1159 (comment) |
Alright. Due to the TLS problem, Infura in unlikely to support that but I suppose that sort of OK as it should be a very rare usecase. |
Well, the hope is that use of companion and/or native IPFS support is wide-spread before that ever becomes an issue... |
This adds subdomain gateway support for CIDs longer than 63 characters. CID is split after reaching 63 character limit counting from right to left. Requests made with random splits are redirected to canonical split version to ensure every CID gets exactly one Origin. Ref. - https://tools.ietf.org/html/rfc1034#page-7 - #7318 License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
I found an interesting Proposed Standard https://tools.ietf.org/html/rfc4343#section-2.2 |
Leveraging RFC4343 is a no-go – no browser support afaik.. FYSA I've talked with @Stebalien last week, and we are re-evaluating. None of us is happy with ramifications of splitting into multiple DNS labels, originally proposed in #7358. It will cause us troubles with TLS in the future, and the ultimate goal of subdomain gateways is seamless UX in web browsers. Decided to look into alternative approach that prioritizes UX in user agents and removes the problem of TLS errors caused by more than one level of wildcards: #7441 |
@lidel can we close this? |
No, we need to solve this in a way that enables people to load all CIDs, no matter what gateway type is used. Right now, subdomains are limited to subset of CIDs: https://dweb.link/ipfs/bafkriqdv2ut4g2hs57uer3hwwbz2gz3hqaeal2po6kyyk7k7tbhqg3vw36er25pxfwnrkriyyhgvra2sq3i5vgry325d32mlljj6l3lyvbexm → Hot take: our options are limited here, could be that that longer CIDs end up on a separate subdomain with the same sandboxing / local storage / api limitations as ones proposed for path gateway (ipfs/in-web-browsers#157). Those would not work as website roots, but would be fine for loading other types of content. |
We can't use dweb.link as the default until ipfs/kubo#7318 is open. Default gateway should be able to open all CIDs, and dweb.link is limited to 63char ones max.
Just wanted to add to this discussion with an idea, what if you used queries to hold the ID of the CID, e.g.
Only CIDs for the same content can share the multihash subdomain, so subdomain isolation should be maintained. (unless I'm missing something major, in which case correct me) (Also, I think |
|
As another option, using CIDv2 (ipfs/specs#305) may allow for "case-insensitive" CIDs which are actually case-sensitive when parsed. The difference between CIDv1 doesn't fit, but is case-insensitive: The advantage is that the CID metadata changes the CID slightly, so each CID will still have Origin Isolation. But, if the metadata itself gets too long, then extremely long CID strings will still be too big, however, encoding the case-binary efficiently to take the minimal space should make the limit pretty high in theory. |
@Winterhuman how you can fit sha512 in proposed CIDv2 and have no more than 63 characters? |
No, that's already described in option C. As in encode a SHA512 CID using a case-sensitive encoding, like base58btc. Then, you store the casing of the characters as metadata, e.g.
Could be |
Either that or you could nest a case-sensitive CIDv1 inside a multibase-esque multiformat so it's constructed like:
That'd get around having to encode the casing metadata inside the case-sensitive encoding itself, but, requires making a new multiformat or modifying multibase significantly |
Couldn't you go with the splitting option, but instead of putting the remainder in a subdomain, you put it in the path?
Do:
This is an annoying UX, but it preserves as much subdomain isolation as is possible with 63 characters and doesn't result in TLS wildcard problems. |
Problem: DNS label limit of 63
The default CIDv1 Base32 with multihash of sha256 and RSA libp2p-key fits:
but if we use ED25519 libp2p-key then we are 2 characters over the limit:
--hash sha2-512
will be even longer: https://bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg.ipfs.dweb.linkLabel longer than 63 characters means the hostname can't resolve:
And links are not picked up by tools like Slack:
Note: I used ED25519 as an example, but not limited to that single type of CID. Even if we find a way to fit ED25519 in a single label, the problem remains for CIDs with a multihash created with longer hash functions.
Solved: IPNS-specific fix for ED25519 keys
In parallel to the generic fix, we could represent ED25519 keys in a way that fits under 63 characters, solving the UX issue for IPNS websites loaded from public gateways.
Done: #7441 – we support
{cidv1base36}.ipns.dweb.link
which perfectly fitsOpen Problem: generic solution for long CIDs
I am happy to open PR with a fix, but unsure if I have the best fix in mind, would love to gather feedback first.
❓ (A) support split CIDs (but have broken TLS)
The first idea I have is to split the label when the max is reached.
To maximize entropy for Origin isolation, the remainder should be on the left side:
Pros:
sha2-512
Cons:
*.eth.link
(https://blog.almonit.eth.link vs https://almonit.eth.link.
in the middle etc).
anywhere inside of CID, but internally merge labels, and return a redirect to canonical version that splits at deterministic position (enforcing maximum label for Origin).❓ (B) redirect long CIDs to an "insecure" subdomain
This would make it possible for content to load, but longer CIDs would not get Origin isolation per CID.
To make this bit more clear and idiomatic, we could present this as "cross origin resource sharing" endpoint that allows both CORS requests + supports loading everything from a single origin + has paths locked down in browsers like noted in ipfs/in-web-browsers#157.
Think in terms of
https://dweb.link/ipfs/superlongcid
redirecting tohttps://cors.dweb.link/superlongcid
Pros:
Cons:
❓ (C) swap DAG root with CID that uses shorter hash function
Pros:
Cons:
❓ (D) leverage HTTP proxy mode (on localhost)
When Gateway port is used as HTTP proxy, local client does not perform DNS lookup, but original URL is sent in HTTP request to the proxy for processing.
Because HTTP proxy IS go-ipfs node in that scenario, it does not do DNS lookup, but extract original (long) CID and resolves it, without involvement of DNS.
As long user agents are not overzealous in validating URLs, this would allow for long (>63) CIDs on subdomains.
This is important, because it enables localhost gateway (used by Brave) to resolve long CIDs correctly without any additional hacks.
UX details tbd. This could be the solution for localhost gateway, but for public ones we still need something else.
Other ideas?
Would love to find a better way to work around this
cc @aschmahmann @Stebalien ipfs/in-web-browsers#89
The text was updated successfully, but these errors were encountered: