Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: provide error codes when closing connections and resetting streams #479

Open
2 tasks
marten-seemann opened this issue Nov 17, 2022 · 2 comments · May be fixed by #623
Open
2 tasks

Proposal: provide error codes when closing connections and resetting streams #479

marten-seemann opened this issue Nov 17, 2022 · 2 comments · May be fixed by #623

Comments

@marten-seemann
Copy link
Contributor

marten-seemann commented Nov 17, 2022

Related PRs

It would be really helpful to know why a peer closed a connection or reset a stream. Unfortunately, we currently don’t have access to that information.

Here’s a proposal how to convey that piece of information.

Connection Termination

Current situation:

  • QUIC: uses a CONNECTION_CLOSE frame, which carries a 62 bit error code and a human-readable message (a string limited by the MTU).
  • WebTransport: uses a CLOSE_WEBTRANSPORT_SESSION capsule, which carries a 32 bit error code and a human-readable message (up to 8k)
  • yamux: has a GOAWAY frame that abuses the length field (32 bit) to carry an error code. Currently the spec only defines 3 distinct error codes
  • mplex: don’t care

It seems straightforward to use a 32 bit error code space for libp2p. If we decide that transmitting an error message is important, we might be able to find a backwards-compatible yamux hack, similar to the one described in the next section.

We could have different error codes for: connections that are closed because they were dial-raced with other connections, disallowed by a connection gater, closed due to resource limitations, closed to make room for more valuable connections, closed for different kinds of protocol violations, etc.

Caveat: With TCP linger set to 0, the TCP connection is reset instead of properly closed. This also means that the error code might not be transmitted reliably.

Stream Termination

Current situation:

  • QUIC: the RESET_STREAM frames contains a 62 bit error code field (there are no human-readable messages for stream resets)
  • WebTransport: limits stream reset error codes to 8 bits
  • yamux: doesn’t allow transmitting any error code
  • mplex: still don’t care

It seems like we’re therefore limit to 256 error codes. We’d need to reserve a subset of these for libp2p itself (for example, we need to convey that multistream negotiation failed, or that we didn’t even start multistream negotiation because of resource limits, etc.). The rest of the error codes would be defined by the application.

yamux hack

Depending on how current implementations handle this, we could either:

  • if implementations ignore data sent on frames that have the RESET flag set: attach the error code to that frame
  • if implementations ignore stream data received on a stream that was set (I think go does): send the error code in a stream frame
@marten-seemann
Copy link
Contributor Author

We've been using a few error codes without having them defined beforehand. This probably means that we need to consider the error codes used there as "burnt". @elenaf9, can you compile a list of application error codes that QUIC in rust-libp2p has been using so far?
Fortunately, this only applies to connection-level error codes, and there are lots of numbers in the uint62 space, so this is nothing more than a mild annoyance.

In the following, here's a proposal for error codes to add. Numerical values are still TBD, the purpose of this exercise is to agree on a list of possible errors (and the names) first.

Connection Error Codes

Error Code Description
REJECTED Connection rejected because the node is temporarily overloaded. Most likely because some accept queue ran full.
GATED Connection rejected because the connection was gated. Most likely the IP / node is blacklisted.
RESOURCE_ALLOCATION_FAILED Connection rejected because we ran into a resource limit.
PROTOCOL_NEGOTIATION_FAILED Connection rejected because we couldn't negotiate a protocol. Note that this error code can not be sent reliably, as we don't have the option to send custom error code during every part of the handshake.
DIAL_CANCELED Multiple connections were raced in parallel. This connection is closed because another connection won the race. Note that this can also happen to newly established connections shortly after the handshake.
GARBAGE_COLLECTED The connection was garbage collected.
SHUTDOWN The node is going down.
CLOSE The user closed the connection explicitly.
PROTOCOL_VIOLATION The peer violated the protocol.

Stream Error Codes

Error Code Description
REJECTED Stream rejected because the node is temporarily overloaded. Most likely because some accept queue ran full.
RESOURCE_ALLOCATION_FAILED Connection rejected because we ran into a resource limit.
PROTOCOL_NEGOTIATION_FAILED Connection rejected because we couldn't negotiate a protocol.

In addition to these libp2p error codes, every protocol will probably want to register their own error codes. For that, it would be really, really nice if we had more than 256 error code values at our disposal.

@sukunrt
Copy link
Member

sukunrt commented Jul 4, 2024

WebTransport: limits stream reset error codes to 8 bits

This is fixed in the latest drafts.
From: https://www.ietf.org/archive/id/draft-ietf-webtrans-http3-09.html#name-resetting-data-streams

Since WebTransport shares the error code space with HTTP/3, WebTransport application errors for streams are limited to an unsigned 32-bit integer, assuming values between 0x00000000 and 0xffffffff.

We can ignore webtransport for now and wait for browsers to upgrade to >= draft-9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Triage
Development

Successfully merging a pull request may close this issue.

2 participants