Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple enhancements to the networking spec discovered after implementing it #1158

Closed
wants to merge 1 commit into from

Conversation

zah
Copy link

@zah zah commented Jun 10, 2019

The Nimbus team has just completed the implementation of the currently published LibP2P networking spec. This gave us a chance to think through some of the details in the spec and several practical issues were discovered. In this PR, I'm trying to address them with minimal modifications to the spec:

Problem 1: The messaging described in the spec follows the client-server paradigm too much.

In the current spec, one of the peers sends requests that are identified by a numeric request id and the other peer replies with a response referencing the same id. The request and response structures are defined in a way that makes it impossible to discriminate requests from responses on the wire and this forces the peers into a more traditional client-server relationship where the one side sends only requests and the other side sends only responses. This is not quite appropriate for a peer-to-peer network such as Ethereum where both peers are expected to make requests.

Few possible ways to solve this problems were considered:

  • Assign even IDs to the requests and odd IDs to the responses
    This seems to be the least invasive change and it's the one being implemented in this PR.
    From our experience, It introduces only a very slight implementation complexity cost.

  • Open two streams per connection and define the client-server roles within each stream.
    This seems to require a much higher implementation complexity cost.

  • Assign separate method id to the responses.
    This would have been easy to implement as well, but it would increase the size of the response structure that will need to store the additional method id.

  • Include a request/response bit in the enclosing message frames.
    Another easy option that was discarded purely on subjective grounds.

Problem 2: The high-level spec is not taking full advantage of LibP2P.

One of the benefits we hoped to reap with LibP2P is the ability to stream very large responses asynchronously as the data required to serve them is gradually fetched from some slower I/O device. In LibP2P, this is possible due to the central concept of stream-multiplexing. The requesting peer can initiate a request by opening a stream and the serving peer can gradually write data to that stream as it becomes available (i.e. without the need to buffer large amounts of data in memory in order to send it in a single response message).

Before Multistream-2.0 is implemented, the use of multiple streams in LibP2P introduces some overhead, which informed the current design of the spec relying on a single stream. Nevertheless, we believe it's easy to emulate the presence of a stream-like device by introducing support for "Chunked responses". This gives us a valuable high-level protocol design tool and prepares us for a future where all requests and responses can be mapped cleanly to the Multistream-2.0 streams.

Other minor changes:

  • A timeout has been specified in the handshake procedure.
    This should prevent a simple DoS attack trying to exhaust the number of open sockets by a client.

  • A number of additional disconnection reasons were added.
    While we were implementing the spec, we noticed that many of the typical checks and safe-guards present in our ETH1 devp2p stack are still relevant for ETH2. Adding explicit disconnection reasons for them serves as a reminder what a typical client should implement.

@djrtwo
Copy link
Contributor

djrtwo commented Jun 12, 2019

cc: @AgeManning

@AgeManning
Copy link
Contributor

AgeManning commented Jun 13, 2019

Yep there has been discussion about this offline.

I believe the consensus is to use substreams per request.

In this scenario, a new substream is opened on each request and it awaits a response before closing the substream.

Lighthouse will be implementing this approach and we should have a greater visibility on this in a week or two. I imagine the RPC Id's just need to be unique in this case.

Also, we also had the above issues when implementing the original RPC (we opted for the prefix-bit in the initial version). These potential solutions are great solutions!

@zah
Copy link
Author

zah commented Jun 18, 2019

We also agree that dedicating a separate LibP2P stream for each request is the approach that follows the architecture of the LibP2P most closely. Since our own implementation is quite flexible, we've included support for both schemes. Here, I'll try to provide a precise specification for how requests can be mapped to streams:

libp2p protocol names

A "Protocol ID" in libp2p parlance refers to a human-readable identifier libp2p uses in order to identify sub-protocols and stream messages of different types over the same connection. Peers exchange supported protocol IDs via the Identify protocol upon connection. When opening a new stream, peers pin a particular protocol ID to it, and the stream remains contextualized thereafter. Since messages are sent inside a stream, they do not need to bear the protocol ID.

RPC-over-libp2p

Terminology:

  • Message name -- each RPC request is identified by a name consiting of English letters, digits and underscores (_).

  • Message schema -- the message schema defines the data types that MUST be transmitted with each request and response.

  • Schema version -- a semantic version consiting of one or more numbers separated by dots (.). Each schema is versioned to facilitate backward and forward-compatibility when possible.

  • Encoding -- while the schema defines the data types in more abstact terms, the encoding describes a specific representation as bytes that will be transmitted over the wire.

  • Protocol Prefix -- the RPC messages are grouped into families identified by a shared LibP2P protocol name prefix. A conforming implementation is expected to support either all messages appearing within a family or none of them.

  • Request -- Requests are initiated by opening a LibP2P stream with the following protocol ID:

    <ProtocolPrefix>/<MessageName>/<SchemaVersion>/<Encoding>

    After opening the stream, the initiator MUST transmit within 3 seconds the serialized contents of the message, according to the specified schema version and the encoding being used.

  • Response code -- a single-byte prefix determinig the contents of the response. It can have one of the following values:

    • 0: Success -- a normal response with contents matching the expected message schema and encoding specified in the request.

    • 1: EncodingError -- the receiver was not able to decode and deserialize the data tranmitted in the request. The response content is empty.

    • 2: InvalidRequest -- The contents of the request are semantically invalid. The response content is a message with the InvalidRequestMsg schema.

    • 3: ServerError -- The receiver encountered an error while processing the request. The response content is a message with the ServerErrorMsg schema.

    Some requests MAY use response codes above 128 to indicate alternative request-specific responses.

  • Response -- After receiving a request, the responding party MUST reply on the same stream within 10 seconds with a response code, followed by the contents of the response.

    Certain RPCs MAY allow for multiple responses to be generated for each request (e.g. publish/subscribe schemes). The responder MUST transmit the Success response code before each serialized message written to the stream. The responder MUST close the stream when no more messages are expected. If an error is encountered after some messages have already been written, the responder MUST transmit the ServerError response code, followed by a serialized mesage using the ServerErrorMsg schema.

    The request initiator MAY close the streat at any time.

Internal schemas:

InvalidRequestMsg

(errorMessage: string)

ServerErrorMsg

(errorMessage: string)

The protocol prefix for all currently defined RPC messages used for syncing the beacon chain is /ETH/BeaconChain.

The supported encodings are SSZ and SSZ+Snappy.

The names of the messages are the ones specified in the rpc-interface spec.

All schema versions start as 1.

@AgeManning
Copy link
Contributor

I'm finally getting around to implementing this.

I'm not sure about go, but in rust, implementing a protocol-id per request is quite difficult. All other protocols that I've seen in libp2p that use RPC-like requests are grouped under a single protocol-id. With this being said, I can see the benefits of using multi-stream select to do the RPC message-type segregation.

An alternative, would be encoding the type in the data sent (although, apart from ease of implementation, and not polluting the protocol-id namespace, I don't have strong arguments for doing this).

Hoping @raulk has some insights on this.

Finally, in your implementation, how does the sending node communicate it has completed the request and is closing the stream?

@raulk
Copy link
Contributor

raulk commented Jul 1, 2019

Generally one protocol per message type (x2 if you model request and response separately, like I’ve seen in some places) introduces overhead for little gain in exchange. It also makes correlation and state keeping difficult. I’d recommend coarser protocols, one per functional area of the system.

Re: the new-stream-per-request pattern, with multistream 1.0.0 you’d be incurring in the negotiation cost per stream; this will be death by a thousand cuts now. Multiselect 2.0 will make this much cheaper (negligible). I recommend you stick to a single long-lived stream per protocol, and move to a single stream per request once we have multiselect 2.0.

@AgeManning
Copy link
Contributor

Thanks for the input @raulk!

It seems in rust we are doing a multi-stream select on each substream regardless so for me there will be minimal extra overhead in building it this way. However, another point was raised, that typically a protocol-id will have a version associated with it. With a protocol-id per request, we may potentially run into more complex versioning, where individual requests now have their own version.

Although the multistream-select overhead is a non-issue for my particular implementation, it seems more logical to localize an entire protocol to a specific protocol id. This way other implementations don't have extra over-heads, we can version all the requests in a single id and the protocols that a node support will be simpler.

Furthermore, I think we should use size-prefixed messages which will indicate the end of a stream, prevent DOS vectors and allows for slower implementations to buffer and send responses gradually over the stream.

What are you thoughts on this @zah?

@zah
Copy link
Author

zah commented Jul 2, 2019

The original PR here tries to provide the minimal modifications needed for the spec in order to make the approach suggested by @raulk workable (that is, using a single long-lived stream over which both peers can send requests freely).

The rest of the discussion is a bit more forward-looking. It tries to identify what would be the best approach in a future where Multiselect-2.0 is available. If this future is sufficiently close, I think it would be worth simplifying the client implementations now at the cost of some transient overhead brought by the current sub-stream negotiation.

Our own client currently implements both the minimally modified spec proposed here and this forward-looking alternative spec as an experiment. We've done some field testing with both approaches in our public testnet and it's clear that the implementation code is slightly simpler when more of the complexity is pushed over to libp2p.

@raulk, can you further clarify the trade-offs between finer and coarser sub-protocols, assuming Multiselect-2.0 is available? It's clear that the advertisement messages will balloon in size, but what do you mean when you say "correlation and state keeping"? Perhaps Multiselect-2.0 can be further extended to allow for a family of sub-stream types to be negotiated just by transmitting an identifier describing the whole family.

@zah
Copy link
Author

zah commented Jul 2, 2019

@AgeManning, I've missed to mention that all SSZ messages have a size prefix indeed. This is an inherent requirement due to the way SSZ deserialisation works and our current implementation of the alternative spec uses a 4-byte length. The clients are definitely allowed to close the stream after reading the expected response.

- `4`: Useless peer.
- `5`: Already Connected.
- `6`: Self-connection.
- `7`: Fault/error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the RLPx error codes, do we need those? In particular 5 implies you cannot have more than one connection with another peer.

@@ -281,3 +288,4 @@ Requests the `block_bodies` associated with the provided `block_roots` from the
Requests contain the hashes of Merkle tree nodes that when merkleized yield the block's `state_root`.

The response will contain the values that, when hashed, yield the hashes inside the request body.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added blank line

@AgeManning
Copy link
Contributor

AgeManning commented Jul 10, 2019

I agree with the protocol segregation.

I've tried to consolidate everything discussed here in #1281. I've specified the protocol-ids for each message, added some clarity in some sections.

I have used /eth/serenity/rpc as the protocol-prefix to be consistent with other libp2p protocols, and have used an unsigned varint to length-prefix the contents of the RPC requests/responses.

Please let me know if there are still discrepancies between your current implementation.

@djrtwo
Copy link
Contributor

djrtwo commented Jul 10, 2019

closing in favor of #1281

@djrtwo djrtwo closed this Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants