Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting multiaddrs #140

Merged
merged 1 commit into from
Sep 30, 2022
Merged

Interpreting multiaddrs #140

merged 1 commit into from
Sep 30, 2022

Conversation

MarcoPolo
Copy link
Contributor

@MarcoPolo MarcoPolo commented Sep 8, 2022

Adds a short section to the readme on how to interpret multiaddrs. I think this is the unifying principle of how we use multiaddrs in practice and could guide how we use them in the future. I don't think anything here is controversial and it fits with the "encapsulation" ideas presented originally. This is really just a clarification of what I think is the original intent.

This is a PR so that it's easier to discuss the changes line by line.

All but one of our constructions fit this principle. The uncommon p2p-circuit with a defined relay path does not fit this principle as the p2p-circuit component is currently defined, but I think we can make some small changes to make it fit. e.g. /ip4/1.2.3.4/tcp/1234/p2p/qmR/p2p-circuit/ip4/4.3.2.1/tcp/54321/p2p/qmA doesn't cleanly fit this model. We could add a new via protocol that takes in a single parameter of data, parses it as a multiaddr, and passes it to the underlying protocol as a via parameter. This would look like /ip4/1.2.3.4/tcp/1234/p2p/qmR/p2p-circuit/via/bracketenc/[ip4/4.3.2.1/tcp/54321]/p2p/qmA. What's nice is the could also describe (though we don't currently support) multiple hops in the circuit: /ip4/1.2.3.4/tcp/1234/p2p/qmR/p2p-circuit/via/bracketenc/[ip4/4.3.2.1/tcp/54321/p2p/QmR2/p2p-circuit/via/bracketenc/[/ip4/2.2.2.2/tcp/1234]]/p2p/qmA.

Example

As an example, here's how you would parse the string and interpret /ip4/1.2.3.4/tcp/1234/p2p/qmR/p2p-circuit/via/bracketenc/[ip4/4.3.2.1/tcp/54321/p2p/QmR2/p2p-circuit/via/bracketenc/[/ip4/2.2.2.2/tcp/1234]]/p2p/qmA. Remember that parsing the binary representation is easy since we either know the size of the data for a protocol (fixed size) or the data is length prefixed.

Parsing the string into a multiaddr, left to right:

  1. Start with /ip4/... We see ip4 and call ip4's string to byte to consume the string up to /tcp/....
  2. We see tcp and call tcp's string to byte to consum up to /p2p/....
  3. continue to via, via doesn't take a value, so we continue to bracketenc/...
  4. bracketenc parses the rest of the string until it finds a matching closing bracket ]. This logic is defined by this encoding. The parsed string is kept as the value for bracketenc.
  5. get to p2p and parse the following peer id.

Interpreting the multiaddr, right to left:

  1. The p2p component takes its peer ID value and passes it to the next component as the parameter peer=QmA.
  2. The bracketenc component takes its value and passes it to the next component as parameter data=ip4/4.3.2.1/tcp/54321/p2p/QmR2/p2p-circuit/via/bracketenc/[/ip4/2.2.2.2/tcp/1234]. It also forwards any other parameter it was given (i.e. peer=QmA).
  3. The via component takes the data parameter and parses the multiaddr (similar to our parsing the string into a multiaddr example above). It passes this to the next component as parameter via=<parsedMultiaddr>. It also forwards any other parameter it was given (i.e. peer=QmA).
  4. The p2p-circuit component now has the parameters via=<parsedMultiaddr> and peer=QmA. The p2p-circuit transport can now dial the relay QmR defined by the rest of the multiaddr (/ip4/1.2.3.4/tcp/1234/p2p/qmR/), and can tell that relay to connect to peer QmA via the path via=<parsedMultiaddr>.
  5. Done!

Why now?

This came up as I was fixing WSS in go-libp2p and realized I needed to keep the hostname around for the SNI (e.g. I needed /ip4/1.2.3.4/tls/sni/example.com/ws).

We also need this to:

  • Define a path to a websocket endpoint
  • Passing SNI to a webtransport connection

Related threads

multiformats/multiformats#55. It's worth reviewing the proposals/multipath.md file first.

I think this is the minimal change that gets multiaddrs able to fulfill the goals of multipath. Key differences here are that we don't have a way to let the codec define the binary output directly (we always prefix it with a varint of byte length). We may want to change this and default it to varint prefix, but allow other ways to write and parse the binary. This would allow us to make the p2p-circuit fit the model in a backwards compatible way.

Questions

  • Is there anything wrong with this interpretation of multiaddrs?
  • Is there something that multipath could provide that we don't get with this change?
  • Can we keep this change as minimal as possible while still getting most of the benefits of multipath?

Changes this would require

  • This would involve deprecating the p2p-circuit + path multiaddr (which is relatively uncommon) in favor of the p2p-circuit/via or p2p-circuit/route option (as demonstrated above).
  • This would involve (at least) changing how go-multiaddr parses string multiaddrs. It shouldn't be splitting on / instead it should parse the multiaddr from the left to the right. This should be backwards compatible, but new multiaddr constructions will not work on older clients. e.g. /ip4/.../http/path/lenprefixencode/20_somepath/tosomething wouldn't work on older clients, but this also wouldn't work because of the missing codec.

What this enables/unlocks

If we agree on this we can unblock the following tricky issues:
This principle unblocks some previously tricky issues:

We unlock the ability to handle parameters in a simple way. And we unlock the ability to support nested paths (via an encoding like: /ip6/.../p2p/QmR1/p2p-circuit-path/lenprefixencode/33_/ip6/.../QmR2/p2p-circuit/p2p/QmA)


Tagging folks I know have opinions here, forgive me if I miss you:
@Stebalien
@marten-seemann
@mxinden
@elenaf9

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high level, this looks good to me. rust-libp2p follows the convention of interpreting a multiaddress from right to left.

I don't think I am deep enough in the matter to have an opinion. I will let others comment first. In case others feel the same or don't have the time to comment, I will give this more thought. If so, please ping me @MarcoPolo.

@MarcoPolo
Copy link
Contributor Author

Had a sync meeting with @Stebalien and here's where we ended up:

  • Steven is in general agreement with this interpretation strategy.
  • Would suggest using bracket encoding with a percent escape mechanism. There is some set of allowed characters, and everything else would be percent encoded escaped. e.g. a path would look like: /dns4/example.com/http/path/bracketenc/[foo/bar/baz] or, if the path had a bracket in it, /dns4/example.com/http/path/bracketenc/[foo/bar%5D/baz] (%5D is the percent encoding of ]). This has the benefit of looking nice for many usecases. Ultimately the binary representation is the same, so it's not worth bikeshedding too much here.
  • Suggest starting with a single encoding strategy to minimize the amount of changes required by implementations.

@MarcoPolo
Copy link
Contributor Author

Moving this out of draft so we can further discuss before potentially merging.

Keep in mind this change is not meant to define what encoding we should choose, only that we should interpret multiaddrs right to left (and parse them left to right).

@MarcoPolo MarcoPolo changed the title Interpreting multiaddrs (for discussion) Interpreting multiaddrs Sep 16, 2022
@MarcoPolo MarcoPolo marked this pull request as ready for review September 16, 2022 18:35
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having given this more thought, this looks good to me. Unless someone objects, I am in favor of merging.

@MarcoPolo MarcoPolo merged commit c964d9a into master Sep 30, 2022
@MarcoPolo MarcoPolo deleted the marco/multiaddrs branch September 30, 2022 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants