SpanKind with layers #526

pauldraper · 2020-03-24T12:00:52Z

Network communication is inherently layered, whether with formal OSI layering or more ad-hoc composition.

For example, a typical client-server HTTP call probably will involve the following operations:

Client: GetUser
    Client HTTP: GET /users/123
        Client DNS: users.example.com
        Client TCP: 10.0.5.1:15642 -> 189.10.14.10:80
            Server TCP: 10.0.5.1:15642 -> 189.10.14.10:80
                Server HTTP: GET /users/123
                    Server: GetUser

These aren't always fixed; the outer RPC layer could decide to retry. Or the HTTP layer could encounter redirects. Nor is are the layers used necessarily known to each other. For example, HTTP2/TCP could be replaced with HTTP3/QUIC.

The same is true of other architectures like message queues, e.g. SQS/HTTP/DNS+TCP. And some libraries/frameworks build even further on top of those.

If I am reading the SpanKind correctly, the semantics have only to do with the parent-child relationships.

How is this intended to work when communication is inherently layered?

Should the RPC instrumenter use SpanKind.CLIENT/SpanKind.SERVER, event though the lower layers may be instrumented?

Should instrumentations have options of what type of SpanKinds to use, and then the user decides which layers are instrumented?

Should RPC instrumentation also use CLIENT/SERVER, and the user just needs to avoid instrumenting lower layers when they are being used by RPC?

If we really want this sort feature, there should be optional span attribute msg.kind which is "sender" or "receiver" and msg.protocol which is "tcp" or "http" or "grpc" or "mongodb", and the backend matches them up.

Or, propagate the other span ids in the context inter-process so it can be linked.

Or propagate a flag in the context indicating the middle of a client/server span that prevents lower layers from adding spans.

(Though to be honest, I'm not seeing quite seeing the utility of a two-sided span. It seems tricky to work with and seems to heavily weight towards at-most-once communication paradigms.)

The text was updated successfully, but these errors were encountered:

blumamir · 2023-01-03T11:10:52Z

I am experiencing another problem related to span kind, and was happy to find this great issue
In my case, the app produces network spans from 2 sources:

From an instrumented library (in my case, aws-sdk ) which creates the chain aws-sdk -> HTTP -> tls -> net, dns. The tls/net/dns spans are of type "internal" according to the specification.
From an uninstrumented library. Since the logical parent is missing (lacks instrumentation) it makes the network spans appear as the first and only indication that an outbound operation happened.

By following the specification, there is no systematic way to tell that an external entity is involved in the trace since the "internal" labeling is misleading. There could of course be other clues scattered around the trace which can indicate this, but

they all have limitations
tons of edge cases
context spans could be missing entirely, whether not instrumented or lost on transit.

For example:

if user receive a trace with single "internal" network span,they cannot even tell if that is incoming or outgoing.
collector processors that process spans one at a time, cannot derive operation direction from parents/children

To me, it makes a lot of sense to record the kind based on the logical role of the operation. Operations initiated by the caller on some logical remote entity should be "client" or "producer" (db drivers, ORMs, HTTP clients, network spans, rpc clients, SDKs) and operations that are initiated by a logical remote entity on me should be "server" or "consumer" (HTTP server, HTTP frameworks, network server sockets, push-based SDKs).

This will serve to decrease the coupling between spans and simplify backends work by introducing systematic simple algorithms to extract structure from arbitrary traces.

I wonder what the flaws are in this alternative and if there is interest in addressing the above issues and considering changes to the span kind specification

pauldraper mentioned this issue Mar 28, 2020

Use Context to stop tracing #530

Open

pauldraper mentioned this issue May 21, 2020

Semantic convention for HTTP start / end times #591

Closed

bogdandrutu added the spec:trace Related to the specification/trace directory label Jun 12, 2020

carlosalberto added area:api Cross language API specification issue spec:protocol Related to the specification/protocol directory labels Jun 26, 2020

mtwo added the release:after-ga Not required before GA release, and not going to work on before GA label Jun 30, 2020

pauldraper mentioned this issue May 2, 2021

Add Suppress Tracing context key #1653

Closed

blumamir mentioned this issue Feb 4, 2023

Clarify semantic of SpanKind regarding parent/child relationships #3172

Closed

lmolkova mentioned this issue Aug 6, 2024

What's INTERNAL span kind and when it should be used #4179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpanKind with layers #526

SpanKind with layers #526

pauldraper commented Mar 24, 2020 •

edited

Loading

blumamir commented Jan 3, 2023

SpanKind with layers #526

SpanKind with layers #526

Comments

pauldraper commented Mar 24, 2020 • edited Loading

blumamir commented Jan 3, 2023

pauldraper commented Mar 24, 2020 •

edited

Loading