Subdomain support for CIDs longer than 63 #7318

lidel · 2020-05-14T16:11:55Z

I hoped to punt this until we need to switch away from sha256 in CIDs, but we may need to solve this problem sooner than expected due to ED25519 keys being new default soon (#6916)

Problem: DNS label limit of 63

RFC 1034: "each node has a label, which is zero to 63 octets in length"

The default CIDv1 Base32 with multihash of sha256 and RSA libp2p-key fits:

but if we use ED25519 libp2p-key then we are 2 characters over the limit:

ED25519 libp2p-key: https://bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipns.dweb.link
CID created with --hash sha2-512 will be even longer: https://bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg.ipfs.dweb.link

Label longer than 63 characters means the hostname can't resolve:

$ ping bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipns.dweb.link
ping: bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipns.dweb.link: Name or service not known

And links are not picked up by tools like Slack:

Note: I used ED25519 as an example, but not limited to that single type of CID. Even if we find a way to fit ED25519 in a single label, the problem remains for CIDs with a multihash created with longer hash functions.

Solved: IPNS-specific fix for ED25519 keys

In parallel to the generic fix, we could represent ED25519 keys in a way that fits under 63 characters, solving the UX issue for IPNS websites loaded from public gateways.

Done: #7441 – we support {cidv1base36}.ipns.dweb.link which perfectly fits

Open Problem: generic solution for long CIDs

I am happy to open PR with a fix, but unsure if I have the best fix in mind, would love to gather feedback first.

❓ (A) support split CIDs (but have broken TLS)

The first idea I have is to split the label when the max is reached.
To maximize entropy for Origin isolation, the remainder should be on the left side:

https://ba.fzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipns.dweb.link

Pros:

👍 each long CID gets own Origin – we keep isolation
👍 path redirect provided by subdomain gateway can take care of splitting
👍 future-proof solution for longer hashes such as sha2-512
- the next limit is pretty far away: the maximum length of full domain name: 253 characters, including dots
- sha2-512 on dweb.link is 121 characters

Cons:

💢 decreased entropy in security guarantees provided by origin isolation
💢 wildcard TLS certificate does not pass validation for more than a single level of labels
- this will produce annoying UX on public gateways such as dweb.link: TLS warning when opening IPFS website on IPNS. we get the same problem as ENS gateway at *.eth.link (https://blog.almonit.eth.link vs https://almonit.eth.link
💢 copying & pasting CID as-is no longer works on public gateways (user needs to put . in the middle etc)
- Note: to make it easier UX-wise, we should allow . anywhere inside of CID, but internally merge labels, and return a redirect to canonical version that splits at deterministic position (enforcing maximum label for Origin).

❓ (B) redirect long CIDs to an "insecure" subdomain

This would make it possible for content to load, but longer CIDs would not get Origin isolation per CID.

To make this bit more clear and idiomatic, we could present this as "cross origin resource sharing" endpoint that allows both CORS requests + supports loading everything from a single origin + has paths locked down in browsers like noted in ipfs/in-web-browsers#157.

Think in terms of

https://dweb.link/ipfs/superlongcid redirecting to https://cors.dweb.link/superlongcid

Pros:

👍 does not break TLS wildcard certs (easy setup for gateway operators)
👍 useful outside this problem: provides idiomatic way for exposing path gateway on subdomain gateways (for use when origin isolation is not needed)

Cons:

💢 long CIDs don't get Origin isolation

❓ (C) swap DAG root with CID that uses shorter hash function

Pros:

👍 "just works"

Cons:

💢 decreased entropy
💢 newly created root blocks need to be persisted somehow: if I bookmark the page loaded via shortened CID and then the root block gets garbage-collected, the address is dead.
- potential fix: we could always create redundant sha256 root block for every DAG that uses longer hash function for interop

❓ (D) leverage HTTP proxy mode (on localhost)

When Gateway port is used as HTTP proxy, local client does not perform DNS lookup, but original URL is sent in HTTP request to the proxy for processing.

Because HTTP proxy IS go-ipfs node in that scenario, it does not do DNS lookup, but extract original (long) CID and resolves it, without involvement of DNS.

As long user agents are not overzealous in validating URLs, this would allow for long (>63) CIDs on subdomains.

This is important, because it enables localhost gateway (used by Brave) to resolve long CIDs correctly without any additional hacks.

UX details tbd. This could be the solution for localhost gateway, but for public ones we still need something else.

Other ideas?

Would love to find a better way to work around this

cc @aschmahmann @Stebalien ipfs/in-web-browsers#89

The text was updated successfully, but these errors were encountered:

ribasushi · 2020-05-14T16:51:53Z

It's a bit unfortunate that keys are so overly verbose: https://cid.ipfs.io/#bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk

It looks like we have an actual protobuf construct inside the raw bytes. Is this... something we need to do?

If we shave off 2 bytes, nothing extra needs to be done...

lidel · 2020-05-14T17:02:50Z

I am afraid even if we find a hacky workaround for libp2p-keys in ED25519, the problem remains for CID that use longer hash functions than sha256.

Stebalien · 2020-05-14T17:08:54Z

For context, we're trying to encode 40 bytes into 62 characters (with one character for the multibase prefix).

I believe base36 would work, if that's an option. That should give us exactly 63 characters.

We could change how we encode these peer IDs in text and use an ed25519 specific codec (<cidv1>-<ed25519>-<multihash>). That would still be a reasonable encoding of an ed25519 CID but I'd prefer to avoid it.

Stebalien · 2020-05-14T17:10:33Z

But I agree we should support longer keys regardless. But will this be a problem for TLS certs? Can we get a double-star cert?

lidel · 2020-05-14T17:44:10Z

I am not aware of any CA that provides double wildcard certs.
That is why ENS gateway still has the TLS warning (example: https://blog.almonit.eth.link).
Switching the default text representation of PeerID to Base36 would introduce work across ecosystem to bubble up support (missing from multibase.csv atm) and its not as popular as RFC version of Base32. Not sure what's lesser evil, that, or a new codec.

Stebalien · 2020-05-14T19:30:15Z

@aschmahmann and I discussed this and it is possible to shrink ed25519 pids, but it's painful and requires coordination with all libp2p implementations.

To shrink ed25519 keys, we need to:

Encode them as <cidv1>-<ed25519>-<multihash> in text. This will reduce the id size to 36 bytes (from 40).
Ideally, migrate to CIDs on the wire in libp2p. That would save us 10% on the wire for ed25519 keys and make it easier to interoperate with other p2p networks (because we could use their native key formats instead of wrapping them in protobufs before hashing).

Unfortunately, if we want to get 1 in the near future, we'd make it significantly harder to get 2. Basically, if we start using the new ed25519 pid encoding now, we'd have to convert back to the normal pid binary format (raw multihash) when decoding. However, if/when we decide to use CIDs as the binary pid format, we'd have trouble round-tripping.

That is, in the ideal world, if we encounter a text-based PID as a CID:

If it uses the libp2p-key multicodec, it's a legacy peer ID. Encode it as a multihash on the wire.
If it uses any other multicodec, it's a new peer ID. Encode it as a CID on the wire.

However, if we implement 1 before 2, we'd have to encode legacy keys in this new CID format. When converting back, we'd end up with the wrong "on the wire" format.

MichaelMure · 2020-05-20T09:00:32Z

* I am not aware of any CA that provides double wildcard certs.

This seems to not be possible: https://serverfault.com/a/946120

MichaelMure · 2020-05-22T10:35:41Z

^ might have been closed a bit eagerly by github.

So am I correct to assume that multi-subdomain is not considered anymore ? That'd be nice as it would be a pain to host with TLS due to the certificate limitation.

ribasushi · 2020-05-22T10:39:57Z

@MichaelMure yeah, github is too eager indeed. Yes, this is precisely why we went with b36 - to keep TLS possible for the time being.

lidel · 2020-05-22T12:14:21Z

We've met yesterday and came up with next steps to
always resolve CIDs over DNS and have no TLS errors when current defaults/ED25519 keys are used:
(1) solve TLS problem for IPNS with ED25519 keys
(2) make it possible to load longer CIDs

Notes at: ipfs/team-mgmt#1159 – early feedback / questions appreciated!

MichaelMure · 2020-05-22T12:32:38Z

Could you explain what (2) is in more details ? This document mainly discuss IPNS.

lidel · 2020-05-22T12:40:27Z

@MichaelMure see ipfs/team-mgmt#1159 (comment)
Note: it won't be needed for defaults, but will make it possible to load custom CIDs if someone has to use longer hashes for some reason.

MichaelMure · 2020-05-22T12:53:41Z

Alright. Due to the TLS problem, Infura in unlikely to support that but I suppose that sort of OK as it should be a very rare usecase.

Stebalien · 2020-05-22T16:41:45Z

Well, the hope is that use of companion and/or native IPFS support is wide-spread before that ever becomes an issue...

bmwiedemann · 2020-05-28T17:12:14Z

I found an interesting Proposed Standard https://tools.ietf.org/html/rfc4343#section-2.2
that suggests that there may be 230=256-26 different usable byte values in DNS hostnames.
But I guess in practice, many servers and clients will not support these as part of FQDNs.

lidel · 2020-06-08T14:57:44Z

Leveraging RFC4343 is a no-go – no browser support afaik..

FYSA I've talked with @Stebalien last week, and we are re-evaluating.

None of us is happy with ramifications of splitting into multiple DNS labels, originally proposed in #7358. It will cause us troubles with TLS in the future, and the ultimate goal of subdomain gateways is seamless UX in web browsers.

Decided to look into alternative approach that prioritizes UX in user agents and removes the problem of TLS errors caused by more than one level of wildcards: #7441

Stebalien · 2021-04-05T15:51:26Z

@lidel can we close this?

lidel · 2021-06-07T20:38:51Z

No, we need to solve this in a way that enables people to load all CIDs, no matter what gateway type is used.

Right now, subdomains are limited to subset of CIDs: https://dweb.link/ipfs/bafkriqdv2ut4g2hs57uer3hwwbz2gz3hqaeal2po6kyyk7k7tbhqg3vw36er25pxfwnrkriyyhgvra2sq3i5vgry325d32mlljj6l3lyvbexm → CID incompatible with DNS label length limit of 63

Hot take: our options are limited here, could be that that longer CIDs end up on a separate subdomain with the same sandboxing / local storage / api limitations as ones proposed for path gateway (ipfs/in-web-browsers#157). Those would not work as website roots, but would be fine for loading other types of content.

We can't use dweb.link as the default until ipfs/kubo#7318 is open. Default gateway should be able to open all CIDs, and dweb.link is limited to 63char ones max.

Winterhuman · 2022-04-16T20:32:09Z

Just wanted to add to this discussion with an idea, what if you used queries to hold the ID of the CID, e.g.

bafkreievmw4c7yvuhvxt4qjcgqz4nsejxrw4wy4xkhtq54dc62ptceu6xq becomes:

vmw4c7yvuhvxt4qjcgqz4nsejxrw4wy4xkhtq54dc62ptceu6xq.ipfs.dweb.link/?id="bafkreie" (or maybe keeping the multihash ID in the subdomain is better)

Only CIDs for the same content can share the multihash subdomain, so subdomain isolation should be maintained. (unless I'm missing something major, in which case correct me)

(Also, I think topic/ed25519 can be removed)

lidel · 2022-04-19T12:28:19Z

we did solve ED25519 in this issue (see first comment) – keeping the label for discoverability
query parameter does not provide Origin isolation, and we already have path gateways for cases where isolation is not necessary, so it adds no value
keeping only the multihash part in the DNS label does not help – multihash with sha512 digest won't fit in a single DNS label
- example: https://cid.ipfs.io/#bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg

Winterhuman · 2022-10-21T19:24:39Z

As another option, using CIDv2 (ipfs/specs#305) may allow for "case-insensitive" CIDs which are actually case-sensitive when parsed.

The difference between foo and FOO can be expressed as 000 and 111, where 0 is lowercase and 1 is uppercase, so if CIDs had metadata to describe their casing, then you could do case-insensitive versions of case-sensitive encoded CIDs. e.g.

CIDv1 doesn't fit, but is case-insensitive: id...long-cid
CIDv1 fits, but is case-sensitive: ID...LoNg-CiD
CIDv2 fits, and is case-insensitive: id+metadata...long-cid (or wherever the metadata for CIDv2 will be placed)

The advantage is that the CID metadata changes the CID slightly, so each CID will still have Origin Isolation. But, if the metadata itself gets too long, then extremely long CID strings will still be too big, however, encoding the case-binary efficiently to take the minimal space should make the limit pretty high in theory.

lidel · 2022-10-21T21:59:10Z

@Winterhuman how you can fit sha512 in proposed CIDv2 and have no more than 63 characters?
Are you suggesting using a different (weaker) hash like sha256 to point at the stronger one sha512?
If so, I am afraid that is not a fix, just a workaround – you are decreasing security of use cases that need longer hashes.

Winterhuman · 2022-10-23T14:16:56Z

No, that's already described in option C. As in encode a SHA512 CID using a case-sensitive encoding, like base58btc. Then, you store the casing of the characters as metadata, e.g.

zYAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt

Could be z+metadata+yajkonbau5kiqmhpmsxycvn66da1vlmwbt, where the metadata bytes describe the casing to apply to the all-lowercase multicodec + multihash characters to make it the original case-sensitive encoding, and since the metadata changes the CID slightly each casing would be a unique CID. One complication is that you'd need the metadata to be encoded as case-insensitive inside the case-sensitive CID in order for it to be read

Winterhuman · 2022-10-23T14:27:06Z

Either that or you could nest a case-sensitive CIDv1 inside a multibase-esque multiformat so it's constructed like:

<multicasing code><multicasing bytes (variable)><multibase><multicodecs>...<multihash digest>

That'd get around having to encode the casing metadata inside the case-sensitive encoding itself, but, requires making a new multiformat or modifying multibase significantly

MicahZoltu · 2024-08-16T13:39:48Z

Couldn't you go with the splitting option, but instead of putting the remainder in a subdomain, you put it in the path?
Instead of:

https://ba.fzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipfs.dweb.link/

Do:

https://fzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipfs.dweb.link/remainder/ba/

This is an annoying UX, but it preserves as much subdomain isolation as is possible with 63 characters and doesn't result in TLS wildcard problems.

lidel added kind/bug A bug in existing code (including security flaws) topic/gateway Topic gateway need/triage Needs initial labeling and prioritization topic/cidv1b32 Topic cidv1b32 topic/ed25519 Issues related to ed25519 Peer IDs labels May 14, 2020

This was referenced May 14, 2020

CID as a Subdomain ipfs/in-web-browsers#89

Open

Switch to ed25519 keys by default so that we can reasonably store signed content records #6916

Closed

Stebalien closed this as completed May 14, 2020

Stebalien reopened this May 14, 2020

lidel mentioned this issue May 19, 2020

feat: wildcard support for public gateways #7319

Merged

ribasushi mentioned this issue May 21, 2020

Base36 byte-encoding specification multiformats/multibase#65

Merged

Stebalien closed this as completed in multiformats/multibase#65 May 22, 2020

ribasushi reopened this May 22, 2020

lidel mentioned this issue May 22, 2020

Create 2020-05-21--design-discussion-subdomains-dns-label-limit.md ipfs/team-mgmt#1159

Merged

Stebalien assigned lidel May 22, 2020

Stebalien removed the need/triage Needs initial labeling and prioritization label May 22, 2020

lidel mentioned this issue May 25, 2020

Support Base36 multiformats/cid-utils-website#23

Closed

aschmahmann mentioned this issue May 28, 2020

Switch to base36 by default in all text output (overriding ipfs/go-ipfs/issues/4143 ) #7357

Closed

Stebalien mentioned this issue Jun 4, 2020

bafy addresses ... #7410

Closed

lidel mentioned this issue Jun 8, 2020

feat: support ED25519 at subdomain gw with TLS #7441

Merged

This was referenced Jun 15, 2020

feat: support Base36 multiformats/js-cid#112

Merged

feat: base36 support at cid.ipfs.io multiformats/cid-utils-website#25

Merged

lidel mentioned this issue Jun 8, 2021

Sandbox resources loaded via a path gateway ipfs/in-web-browsers#157

Open

4 tasks

lidel added a commit to ipfs/ipfs-webui that referenced this issue Sep 6, 2021

fix: use ipfs.io by default

c0e6789

We can't use dweb.link as the default until ipfs/kubo#7318 is open. Default gateway should be able to open all CIDs, and dweb.link is limited to 63char ones max.

aschmahmann mentioned this issue Oct 7, 2021

use a random string as peer-name in mDNS libp2p/specs#368

Merged

galargh added this to IPFS Shipyard Team Mar 2, 2022

BigLep unassigned lidel Mar 3, 2022

BigLep moved this to 🥞 Todo in IPFS Shipyard Team Mar 3, 2022

BigLep added this to the TBD milestone Mar 3, 2022

aschmahmann mentioned this issue Mar 28, 2023

feature request: Support fetch over libp2p libp2p/js-libp2p#1648

Closed

lidel mentioned this issue Sep 18, 2023

[MV3 Beta Bugs] Single catch-all rule per subdomain gateway ipfs/ipfs-companion#1278

Open

lidel mentioned this issue Aug 16, 2024

Share Link incorrectly gives path routed instead of subdomain routed URL. ipfs/ipfs-webui#2244

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subdomain support for CIDs longer than 63 #7318

Subdomain support for CIDs longer than 63 #7318

lidel commented May 14, 2020 •

edited

Loading

ribasushi commented May 14, 2020 •

edited

Loading

lidel commented May 14, 2020

Stebalien commented May 14, 2020

Stebalien commented May 14, 2020

lidel commented May 14, 2020

Stebalien commented May 14, 2020

MichaelMure commented May 20, 2020

MichaelMure commented May 22, 2020

ribasushi commented May 22, 2020

lidel commented May 22, 2020 •

edited

Loading

MichaelMure commented May 22, 2020

lidel commented May 22, 2020

MichaelMure commented May 22, 2020

Stebalien commented May 22, 2020

bmwiedemann commented May 28, 2020 •

edited

Loading

lidel commented Jun 8, 2020 •

edited

Loading

Stebalien commented Apr 5, 2021

lidel commented Jun 7, 2021

Winterhuman commented Apr 16, 2022 •

edited

Loading

lidel commented Apr 19, 2022 •

edited

Loading

Winterhuman commented Oct 21, 2022 •

edited

Loading

lidel commented Oct 21, 2022 •

edited

Loading

Winterhuman commented Oct 23, 2022 •

edited

Loading

Winterhuman commented Oct 23, 2022

MicahZoltu commented Aug 16, 2024

Subdomain support for CIDs longer than 63 #7318

Subdomain support for CIDs longer than 63 #7318

Comments

lidel commented May 14, 2020 • edited Loading

Problem: DNS label limit of 63

Solved: IPNS-specific fix for ED25519 keys

Open Problem: generic solution for long CIDs

❓ (A) support split CIDs (but have broken TLS)

❓ (B) redirect long CIDs to an "insecure" subdomain

❓ (C) swap DAG root with CID that uses shorter hash function

❓ (D) leverage HTTP proxy mode (on localhost)

Other ideas?

ribasushi commented May 14, 2020 • edited Loading

lidel commented May 14, 2020

Stebalien commented May 14, 2020

Stebalien commented May 14, 2020

lidel commented May 14, 2020

Stebalien commented May 14, 2020

MichaelMure commented May 20, 2020

MichaelMure commented May 22, 2020

ribasushi commented May 22, 2020

lidel commented May 22, 2020 • edited Loading

MichaelMure commented May 22, 2020

lidel commented May 22, 2020

MichaelMure commented May 22, 2020

Stebalien commented May 22, 2020

bmwiedemann commented May 28, 2020 • edited Loading

lidel commented Jun 8, 2020 • edited Loading

Stebalien commented Apr 5, 2021

lidel commented Jun 7, 2021

Winterhuman commented Apr 16, 2022 • edited Loading

lidel commented Apr 19, 2022 • edited Loading

Winterhuman commented Oct 21, 2022 • edited Loading

lidel commented Oct 21, 2022 • edited Loading

Winterhuman commented Oct 23, 2022 • edited Loading

Winterhuman commented Oct 23, 2022

MicahZoltu commented Aug 16, 2024

lidel commented May 14, 2020 •

edited

Loading

ribasushi commented May 14, 2020 •

edited

Loading

lidel commented May 22, 2020 •

edited

Loading

bmwiedemann commented May 28, 2020 •

edited

Loading

lidel commented Jun 8, 2020 •

edited

Loading

Winterhuman commented Apr 16, 2022 •

edited

Loading

lidel commented Apr 19, 2022 •

edited

Loading

Winterhuman commented Oct 21, 2022 •

edited

Loading

lidel commented Oct 21, 2022 •

edited

Loading

Winterhuman commented Oct 23, 2022 •

edited

Loading