Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpanKind with layers #526

Open
pauldraper opened this issue Mar 24, 2020 · 1 comment
Open

SpanKind with layers #526

pauldraper opened this issue Mar 24, 2020 · 1 comment
Labels
area:api Cross language API specification issue release:after-ga Not required before GA release, and not going to work on before GA spec:protocol Related to the specification/protocol directory spec:trace Related to the specification/trace directory

Comments

@pauldraper
Copy link

pauldraper commented Mar 24, 2020

Network communication is inherently layered, whether with formal OSI layering or more ad-hoc composition.

For example, a typical client-server HTTP call probably will involve the following operations:

Client: GetUser
    Client HTTP: GET /users/123
        Client DNS: users.example.com
        Client TCP: 10.0.5.1:15642 -> 189.10.14.10:80
            Server TCP: 10.0.5.1:15642 -> 189.10.14.10:80
                Server HTTP: GET /users/123
                    Server: GetUser

These aren't always fixed; the outer RPC layer could decide to retry. Or the HTTP layer could encounter redirects. Nor is are the layers used necessarily known to each other. For example, HTTP2/TCP could be replaced with HTTP3/QUIC.

The same is true of other architectures like message queues, e.g. SQS/HTTP/DNS+TCP. And some libraries/frameworks build even further on top of those.

If I am reading the SpanKind correctly, the semantics have only to do with the parent-child relationships.

How is this intended to work when communication is inherently layered?

Should the RPC instrumenter use SpanKind.CLIENT/SpanKind.SERVER, event though the lower layers may be instrumented?

Should instrumentations have options of what type of SpanKinds to use, and then the user decides which layers are instrumented?

Should RPC instrumentation also use CLIENT/SERVER, and the user just needs to avoid instrumenting lower layers when they are being used by RPC?


If we really want this sort feature, there should be optional span attribute msg.kind which is "sender" or "receiver" and msg.protocol which is "tcp" or "http" or "grpc" or "mongodb", and the backend matches them up.

Or, propagate the other span ids in the context inter-process so it can be linked.

Or propagate a flag in the context indicating the middle of a client/server span that prevents lower layers from adding spans.

(Though to be honest, I'm not seeing quite seeing the utility of a two-sided span. It seems tricky to work with and seems to heavily weight towards at-most-once communication paradigms.)

@bogdandrutu bogdandrutu added the spec:trace Related to the specification/trace directory label Jun 12, 2020
@carlosalberto carlosalberto added area:api Cross language API specification issue spec:protocol Related to the specification/protocol directory labels Jun 26, 2020
@mtwo mtwo added the release:after-ga Not required before GA release, and not going to work on before GA label Jun 30, 2020
@blumamir
Copy link
Member

blumamir commented Jan 3, 2023

I am experiencing another problem related to span kind, and was happy to find this great issue
In my case, the app produces network spans from 2 sources:

  1. From an instrumented library (in my case, aws-sdk ) which creates the chain aws-sdk -> HTTP -> tls -> net, dns. The tls/net/dns spans are of type "internal" according to the specification.
  2. From an uninstrumented library. Since the logical parent is missing (lacks instrumentation) it makes the network spans appear as the first and only indication that an outbound operation happened.

By following the specification, there is no systematic way to tell that an external entity is involved in the trace since the "internal" labeling is misleading. There could of course be other clues scattered around the trace which can indicate this, but

  • they all have limitations
  • tons of edge cases
  • context spans could be missing entirely, whether not instrumented or lost on transit.

For example:

  • if user receive a trace with single "internal" network span,they cannot even tell if that is incoming or outgoing.
  • collector processors that process spans one at a time, cannot derive operation direction from parents/children

To me, it makes a lot of sense to record the kind based on the logical role of the operation. Operations initiated by the caller on some logical remote entity should be "client" or "producer" (db drivers, ORMs, HTTP clients, network spans, rpc clients, SDKs) and operations that are initiated by a logical remote entity on me should be "server" or "consumer" (HTTP server, HTTP frameworks, network server sockets, push-based SDKs).

This will serve to decrease the coupling between spans and simplify backends work by introducing systematic simple algorithms to extract structure from arbitrary traces.

I wonder what the flaws are in this alternative and if there is interest in addressing the above issues and considering changes to the span kind specification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue release:after-ga Not required before GA release, and not going to work on before GA spec:protocol Related to the specification/protocol directory spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

5 participants