-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add keyword arguments to protocols #87
Comments
@Stebalien @whyrusleeping @mkg20001 @eyedeekay @mwnx @lgierth: I want feedback! 🙂 |
@Alexander255 I for my part already created something a bit similar, called forward-addr. https://github.com/Teletunnel/specs/blob/master/SPECS.md and the code https://github.com/Teletunnel/forward-addr |
@mkg20001: Not bad!
Looking at your linked specs it appears like transliteration in terms of textual representation for MultiAddr would be something like this (ignoring sub-protocols for now):
Comparing that to the previous recursive approach:
Compared to the proposed syntax: Equivalent is terms to expressive power and parsing properties, just a different syntax. (I like mine more, but that's subjective! Then again, this one has a more path-like structure which I don't really consider a plus for attributes – others might through.)
Maybe I'm missing something, but it's this part (while cool for the job you envisioned) pretty irrelevant here since MultiAddr is about establishing connections, not filtering them? Even when using MultiAddr to bind to a port or path on a web server this wouldn't be useful?
The only example I could find about this was |
Yes, it is. It was created for a separate project (https://github.com/Teletunnel/Teletunnel-Core) which was a proposed improvement to https://telebit.cloud 's config format.
My reasoning was that the
...the message still includes the path. Which I needed to match WebSocket connections for specific paths. Otherwise I would have to write two modules for HTTP, one just for WebSocket upgrade message header parsing so that matching |
Oh and I noticed you may have misinterpreted the spec a bit: |
Many thanks for writing this up so eloquently, @Alexander255. Let me just rant a little bit and dump some rambling thoughts. Mapping your proposal to challenges we needed to solve:
One thing that I find unclean about the current multiaddr is that it mixes locators (IP addresses and ports), with network protocols (tcp, udp), with higher level protocols (onion, quic), with application protocols (http), with libp2p facilities (p2p-circuit), with identity assertions (peer ID). As a result, for a given multiaddr to work you need these components to be assembled in specific recipes which might not be obvious from the get-go (e.g. Actually, I'd argue that the "slurping" model of protocol handlers (eagerly parsing the tail) makes the entire meaning of the multiaddr dependent not only on code, but on the version of the code that a node is running. If we want multiaddrs to become a standard, we need a higher degree of formality. Each component might expose a schema, and compositionality may emerge from those schemas. So returning to the topic of semantics, as a community we ought to deeply reflect on all the possible things a multiaddr can represent: location, identity assertions, protocol layering, routing, tunneling, protocol selection, etc. Then we need to think about the combinatorics behind all of these elements. At the end of this exercise, we might come up with an entirely different structural model to reason about and encode multiaddrs. |
Thanks! 🙂
The proposal includes a strategy for escaping the 4 sensitive characters:
I've added an example of this to the proposal:
The proposal does not currently include anything about the possible mandatory argument: The question how to embed paths for Unix Domain Sockets is not addressed by this proposal, but I'd imagine similar rules could be applied there. Of course this proposal still doesn't make it possible to decapsulate protocols from the end of the stack without parsing all their preceding values.
I agree with you on the last 3: Maybe MultiAddr should be limited to stream/dgram protocols only? Including HTTP is problematic for instance, because “establishing a HTTP connection” will not give me an “HTTP Connection”, but a TCP (or TLS) connection that I expect to be able to send HTTP messages over. Similar things apply to the libp2p facilities which also are more assertions of the kind: After establishing a stream using the previous layers, expect to be able to send messages of type X. I guess it depends on what you want the addresses to represent. Does
If by “slurping” you’re referring to what I called “recursion” in the problem section, then you have my fullest agreement. Otherwise please elaborate!
A grand statement! How do we do that without all talk dwindling into nothing? 😉 |
The problem with that is that there actually isn't any
unless |
Updated the proposal to include a description of a compact representation of the protocol's attributes, as well as a description of the current binary format in general. |
Looking at the related issues before this comment, it appears that this proposal would solve real problems that people (not just me) have with the current state of affairs in Multiaddr. Any way to move this discussion forward? |
Awesome ideas @ntninja! I just wanted to give my thoughts and ideas on this as well.
And only using
For situations where one needs to clarify which parameter is which.
Advantages
A couple examples:
|
Some comments: While I like your proposal of adding protocol arguments in the way may programming languages do ( In my proposal I tried to be sure that it would be as backward-compatible as possible with existing addressing data. (But it's necessarily not forward-compatible, in that new addresses following the proposed scheme would work on older implementations.) I, personally, wouldn't be against it and there could be some kind of conversion scheme implemented, but it's probably better to keep these things somewhat separately.
I think your example is somewhat off here. It would be more like
I do dislike the fact that parameters can be given as both positional and keyword arguments in your proposal. IMHO, it should be either one or the other. For instance:
This should make the string representation non-ambiguous, except for ordering of keyword arguments and spacing (if we allow that). In particular it means that the canonical string representation can always be constructed even if not all parts of the address are known. (Of course, non of this actually matters as Protocal Labs just ignores pretty much everything MultiFormats unless they need a change for themselves.) |
For the brave soul that picks this up at some point, food for thought:
|
Problem description
The current MultiAddr spec does not have any good way for dealing with optional protocol parameters that have well defined defaults. Depending on the specific protocol in question different workarounds have been proposed, the predominant theme being recursion:
/ip6/fe00::32/ip6zone/6/…
/tls/sni/example.com/…
This has the obvious problems that:
There also do not seem any obvious advantages to this scheme that would somehow make the above problems appear like reasonable trade-offs.
Another proposal suggested in some places (#63) was using plain greediness: After a given protocol item shows up in the path, all further items are swallowed up and used as single “path parameter”:
/http/example.com/api/v1
(hereexample.com
is the hostname and/api/v1
the HTTP path base)/wss/example.com/api/v1/tls/ws
/unix/path/to/socket.sock/tls/ws
While HTTP arguably is a terminator protocol (meaning that no other protocol may follow it anyways – this notion needs separate discussion!), Unix domain sockets and WebSockets definitely are not. Hence, it is unclear how a parser should figure out that
/tls
does not refer to a path component and whether this even is the case (the parser would have to proactively probe the file system for this, which is very much not in line with the vision of MultiAddr being a common description of paths to application endpoints; with WebSockets this is not even reliably possible to start with).The example with WebSockets in particular demonstrates why this cannot work. A suggested alternative was to wrap the path parameter inside some kind of special set of delimiters (different kinds of braces were suggested):
/wss/(/example.com/api/v1)/tls/ws
While this works, it does not take into account the fact that there is nothing usually required about the given parameter: The hostname can usually be inferred from previous protocol levels (and left empty if unknown) and the path may always be empty.
Also potentially relevant data (such as HTTP basic auth) may be missing from the above. By combining the two approaches discussed above we arrive at something similar to the following:
/wss/(/example.com:4443/api/v1)/user/john/password/doh/cookie/bla=blab/tls/ws
Or the following when excluding all attributes:
/wss/()/tls/ws
Neither of these strike the author as particularly intelligible.
This proposal will not attempt to resolve the issues with Unix domain sockets.
Proposed solution
Summary:
ip6zone
Text-representation syntax
Extending the current spec, each protocol name may now optionally be followed by an opening parenthesis character (
(
) indicating the start of the protocol parameter list. This is to be followed by an arbitrary number of key-value parameters, each delimited by the coma character (,
) and terminated by a closing parenthesis character ()
). After this closing character a forward slash (/
) is expected. If the parameter list is skipped the protocol name should immediately be followed by a forward slash (as is currently the case); an empty parameter list (()
) is allowed as well.Each key-value pair consists of a name, made up only of ASCII lower-case characters, ASCII digits and the ASCII minus sign (
-
), followed by a single equals sign (=
), followed by an arbitrary UTF-8 encoded value. The value may contain any character other then the NUL-byte, but requires escaping of the following characters using a single backward slash (\
) if they are to appear inside the value field: opening ((
) and closing parenthesis ()
), the coma character (,
) and the backward slash (\
) itself. Most importantly the forward-slash (/
) does not need to be escaped since it carries no special significance inside protocol parameter list; this allows for easy embedding of paths, like in the following example:/http(host=example.com,base=/api/v1)
/http(base=/endpoint\(1:2\))
More examples:
/tls(sni=example.com)
/ip6(scope=6)/fe00::32/tcp/80/http
/wss(host=example.com:4443,base=/api/v1,user=john,password=doh,cookie=bla=blab)/tls/ws
host
here refers to the HTTP Host-Header and has nothing to do where to connection will actually be made to./wss/tls/ws
Each protocol may still accept zero or one static parameters or known or unknown binary length after the final forward-slash. It is expected the use of optional parameters will be minimal in practice (HTTP-y stuff probably being the prominent exception here, not the rule).
(Precise syntax subject to change/bikeshedding!)
Binary-representation syntax
The general format for the binary syntax is:
<ProtocolBinary>
is the binary MultiAddr representation of the protocol itself and uses the following format:The format used for the
<ProtocolValue>
part of the representation depends on the<ProtocolType>
:[NIL]
(No value): Used by all protocols with zero static parameters; no value follows and attributes or further protocols may immediately follow.<ProtocolValue>
: Used by all protocols with one static parameter of known binary length; the value, of a length predefined for each protocol type, immediately follows.<ProtocolLength><ProtocolValue>
: Used by all protocols with one static parameter of variable binary length; the<ProtocolLength>
is a UVarInt containing the length of the following protocol value.text_value ࣃ≃ binary2text(text2binary(text_value))
binary_value ≃ text2binary(binary2text(binary_value))
≃
means “must be equal with regards to the constraints imposed by the protocol” – for instance, DNS names are case-insensitive hence a loss of case may be acceptable as this is not considered relevant “information” in this protocol (XXX: find better wording for this).<AttributeBinary>
is the binary MultiAddr representation of a single protocol attribute and must follow either a protocol binary representation or another attribute. All attributes share a single format:In this definition:
[ATTR_TOKEN]
is a reserved UVarInt indicating the start of an attribute, whose value must not every be used for a<ProtocolValue>
(TODO: Decide on a value)<AttributeKey>
is a UVarInt from a table of known attribute names. Attributes in this table are not bound to any specific protocol, it serves only as a look-up table for keeping the binary representation of attributes small.<AttributeLength>
is a UVarInt determining the length of the following<AttributeValue>
in bytes.<AttributeValue>
is the UTF-8 encoded text of the attribute's value in the text representation.TODO: Allow storing unknown attributes in binary, whose names are not in the table?
Other requirements
Unexpected parameters should result in an error when trying to instantiate the given protocol and may result in an error during parsing of the given MultiAddr. For each expected parameter there must be a sensible default value and parameters whose value corresponds to such default value should be omitted from the textual and binary representations. All parameters must be optional, for mandatory parameters the current
/protoname/param
syntax should be used instead.EDIT 1: Some language improvements + language-change to always call it an “HTTP path base”, since the path only refers to the path bases used to multiplex different HTTP services of a single hostname and not about referring to actual single files
EDIT 2: Added example for escaping
EDIT 3: Specify binary encoding (but specific to the proposal at hand and for what we already have)
The text was updated successfully, but these errors were encountered: