Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add client semantic conventions for socket connections #756

Closed

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Feb 17, 2024

The first stab at client socket connection conventions #454

Changes

Defines connection spans and metrics

Merge requirement checklist

@lmolkova lmolkova changed the title Add semantic conventions for socker connection client Add client semantic conventions for socket connections Feb 17, 2024
@samsp-msft
Copy link

In the case of an http connection that technically is built on top of a socket connection, would you expect that to parent a socket connection, or should that just be modelled as its own variation - an http.connection (or connection.http ?) span?
I think that they would share a lot of the same attributes.

@lmolkova
Copy link
Contributor Author

In the case of an http connection that technically is built on top of a socket connection, would you expect that to parent a socket connection, or should that just be modelled as its own variation - an http.connection (or connection.http ?) span? I think that they would share a lot of the same attributes.

My proposal is to identify a common set of connection-related things and model HTTP, AMQP, DB, etc connection in the same way - they'd effectively apply to socket-level API which are the same everywhere.

The pools are more interesting and this is where we might need HTTP connection pool, DB connection pool, etc...

I don't believe though that there is a consensus on this in the community - see #703 for the DB discussion.


# Semantic Conventions for Connection Spans

This document defines semantic conventions to apply when instrumenting client side of socket connections with spans.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With http/3 over quick, the connection is virtual and may span multiple UDP packets. The client IP/Network may even change during the duration of the connection, for example switching between wifi and cellular when a mobile client is moved out of range.
Rather than tying this directly to a socket, the type can be tracked by an additional type property. This same concept can then be used for database, http and a range of scenarios, but with optional attributes based on the scenario.

Copy link
Contributor Author

@lmolkova lmolkova Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http/3 still operates on top of UDP sockets.
I'm not an expert, but I believe from socket perspective we still have different connections established when QUIC connection migration happens, the only thing it saves is TLS handshake - it won't happen again during migration.

It's a good question how to represent QUIC logical connection, but given it's such a long lived thing, I don't see why we can't have a span for it and spans for all the underlying real socket connections it creates.


- `connect` span: describes the process of establishing a connection. It corresponds to `connect` function ([Linux or other POSIX systems](https://man7.org/linux/man-pages/man2/connect.2.html) /
[Windows](https://docs.microsoft.com/windows/win32/api/winsock2/nf-winsock2-connect)).
- `connection` span: describes the connection lifetime: it starts right after the connection is successfully established and ends when connection terminates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need both, connect and connection, or if all the data can be represented in the connection.
In an HTTP case, the equivalent of connect is a wire-request - as the typical http request span tracks a logical operation rather than what happens on the wire. If there is auto-redirection for example as part of the http library, then the http request may actually result in multiple wire-requests as it retrieves the redirect and then makes a subsequent call for the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's important to know how long it takes to establish a connection and important to know if connection was ever established and then terminated.

We can potentially have one span for connection and then indicate when the connection has happened with an event, but I'd still argue that we need two separate metrics.

| [`network.peer.port`](../attributes-registry/network.md) | int | Peer port number of the network connection. | `65123` | Conditionally Required: when applicable |
| [`network.transport`](../attributes-registry/network.md) | string | [OSI transport layer](https://osi-model.com/transport-layer/) or [inter-process communication method](https://wikipedia.org/wiki/Inter-process_communication). [3] | `tcp`; `udp` | Recommended |
| [`network.type`](../attributes-registry/network.md) | string | [OSI network layer](https://osi-model.com/network-layer/) or non-OSI equivalent. [4] | `ipv4`; `ipv6` | Recommended |
| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [5] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Conditionally Required: if available without reverse DNS lookup |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this where we would add network.protocol.name, network.protocol.version, tls.version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think they'd be useful on connection spans/metrics?

network.protocol.* describe application-level protocol, not a transport-level thing.
You can send AMQP or HTTP over the socket connection - the connection does not care and does not need to know.

For TLS and DNS we'll need a new spans not described in this PR

| `network.type` | `"ipv4"` |
| `error.type` | `econnrefused` |

### Relationship with application protocols such as HTTP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Http becomes interesting with connection pooling, and the ability to either do sequential (http 1.1) or parallel (http2,3) requests over the same connection.
We would probably want some form or event for when the wire-requests are put on to a specific connection. Similarly for DNS lookup and TLS handshake, those are events that occur as part of connection establishment that are important to collect in some way for deeper diagnostics. Should they be events as defined by tracing, or log messages tagged with the same traceid/spanid?

Copy link
Contributor Author

@lmolkova lmolkova Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why events? we have links to correlate request to connection and we can put attributes on them if any is necessary.

Do you want to capture moment in time when the request is associated with the connection? We don't capture it on links yet, but we can start. Record a link and an event is an overkill.

I wonder if DNS and TLS should be spans or events. Since they involve network and have non-zero duration, spans would work better (but will be slightly less performant).
Given that connections live much longer than requests, the volume of such spans would be low and perf should not be a big deal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that this needs to be a dial that ops can turn based on how much data that they want to collect. Using the scenario of an HttpClient call (outgoing):

  • You have HttpClient spans (implemented today) - these are really tracking a logical request rather than what is physically happening on the wire.
  • The next level would be a chain of physical requests - in the case of redirection, it could be multiple before the final request, or it could do a continue to resume collection of data.
  • The connections themselves are longer lived beyond a single request. They have DNS and TLS as part of the initialization.
  • DNS lookup for a connection
  • TLS negotiation

In most cases you probably don't want to collect all of these all the time. However I can see ops turning them on when needed to collect more specific diagnostics data. So can we make it adaptive, and be able to correlate the data when applicable?

I am wondering if an "event" + optional link approach would be best. When a request is put on the wire for a connection - you'd get an event - that way you kind of know what the delay was before your request was processed. If the connections are being tracked, then that "event" would have a link to the connection span, so you could correlate them together.
Similarly a connection would have an "event" for when the dns resolution has occurred, and TLS negotiation is complete. If either are being tracked, that event would link to their respective spans - although those could probably be parented to the connection.

I use "event" in quotes as I am told the future of events on spans is unclear - it could be done with a log message instead.

### Successful connect, but connection terminates with an error

Successful connection attempt to `example.com` results in the following span:
> Note: DNS lookup is outside of the scope of this semantic convention

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that we shouldn't be trying to include all the dns info in the connection - having an event/marker for timing that indicates when it was complete would be helpful.
If dns is being tracked with its own spans - having a convention for linking from the connection span to dns would make sense.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 9, 2024

this is partially addressed in #1192 (specifically for .NET).

I'm going to close this PR with the intention to evolve connection-level observability via #1192 and follow up PRs to generalize beyond .NET

@lmolkova lmolkova closed this Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants