Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better structure for span identification #531

Open
pauldraper opened this issue Mar 28, 2020 · 15 comments
Open

Better structure for span identification #531

pauldraper opened this issue Mar 28, 2020 · 15 comments
Assignees
Labels
release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory

Comments

@pauldraper
Copy link

pauldraper commented Mar 28, 2020

Consider these canonical examples:

Span Name Guidance
get_account Good, and account_id=42 would make a nice Span attribute
get_account/{accountId} Also good (using the "HTTP route")

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-tracing.md#span

These, are quite frankly terrible identifications for a span. The headliner information doesn't give me a clue whether I'm looking an HTTP request/response, an RPC call, a database procedure/query, a cloud function, a cache lookup, an internal computation etc.

The trace itself isn't the best particular example, but consider Datadog's tracing interface:

There is both:

  • A span "type" that is instrumentation-determined (Datadog vocab: "name" or "operation").

    • http.request
    • mongodb.query
    • lambda.invocation
    • grpc.call
    • java.function
  • A span "name" that is application-determined (Datadog vocab: "resource").

    • get_account
    • /users/{id}
    • SELECT * FROM pokedex
    • com.example.Thing.run.

Whether this is done as syntax in the span name (TYPE:NAME), or whether as attribute (type: TYPE, component: TYPE), there should be some standard method of assigning classification.

Otherwise, I wind up with spans auto-named "get_account," all of wildly different flavors (HTTP, RPC, Message Queue, DB), and I'm left trying to tell them apart. Naturally, with enough inspection into attributes that is possible, but there are a lot of attributes to look through (a high level view of trace usually doesn't show them due to their number).

(Note I am not talking about tracer name, which is refers to the instrumentation. I am talking about either the instrumented technology, or type of operation.)

I believe this overlaps with #271, though there is little recorded discussion, so I'm not entirely sure what happened.

@arminru
Copy link
Member

arminru commented Mar 30, 2020

Backends should already be able to deduce the type of action using the attributes added to spans according to the semantic conventions defined in this spec. The easiest way would be a if-else cascade checking for the presence of mandatory attributes in a certain order (db.type, messaging.system, http.method, rpc.service, ...). Therefore, the problem you mentioned is merely one of the backend and not the data model itself.

PS: Does the color coding in your screenshot correspond to the types ("names") that you list below or how can you tell which one it is?

@dyladan
Copy link
Member

dyladan commented Mar 30, 2020

@arminru dd trace color scheme is either by service or host depending on settings.

@pauldraper
Copy link
Author

pauldraper commented May 7, 2020

Backends should already be able to deduce the type of action using the attributes added to spans according to the semantic conventions defined in this spec. The easiest way would be a if-else cascade checking for the presence of mandatory attributes in a certain order

And exactly what backend would you recommend for this?

I really don't get the hesitation to make spans have properly a discriminated type.

@arminru
Copy link
Member

arminru commented May 7, 2020

@pauldraper

And exactly what backend would you recommend for this?

Well I work on a backend where we do it this way, so my recommendation would most probably be strongly biased at least. 😄

I really don't get the hesitation to make spans have properly a discriminated type.

The type you'd like to have added had actually been there in the past, it was named component but removed in #271. Unfortunately, the rationale for the decision was not really properly documented on the issue nor in the meeting notes. Based on the description provided by @yurishkuro, who opened the issue, I'd say it was removed for the motivation he stated - the fact that component is redundant since the type/kind of span can be inferred by looking at (required) span attributes as I also mentioned above (#531 (comment)).

Apart from that, at the time the issue was opened, component was not well-specified. For database spans, for example, component was not defined as a fixed string "db" or "database" but rather an unbounded, free text value as initially criticized in #245 (title was reworded after component was removed):
component: Database driver name or database name (when known) "JDBI", "jdbc", "odbc", "postgreSQL".
This definition would not have been of any help for the purposes you described but could've been fixed as well, of course, rather than removing component entirely.

@pauldraper
Copy link
Author

pauldraper commented May 7, 2020

the fact that component is redundant

Not really. Without it, you need to add information (an algorithm for deducing type) that you wouldn't otherwise.

This definition would not have been of any help for the purposes you described but could've been fixed

It certain would help. I don't need it to necessarily be standardized. I just need unique operation names.

The canonical examples of good span names are get_account and get_account/{accountId}.

I have no earthy idea which of the various flavors of "get_account" I have in my stack: database, HTTP, in-process function, cloud function, AMP message? I don't necessary need a perfectly uniform component classification scheme, but I do need to tell the HTTP request get_account apart from the database query get_account apart when they show up in report, list, etc. And tacking on 20 attributes of every possible kind to achieve that uniqueness isn't wieldy.

Now, perhaps the specification just has really, really bad examples of span names. Maybe the good span names would be HTTP:get_account, JDBI:get_account, etc. I don't care whether it's an attribute or span name prefix; I just want to tell my operations apart, and currently the spec seems to do a very bad job of that.

Programming a backend in order to that basic thing...that seems unnecessarily complex and poorly supported.

@Oberon00
Copy link
Member

Oberon00 commented May 7, 2020

Since you complain about span name, I think it currently has an unclear purpose, see related issue #557.

@bogdandrutu bogdandrutu added the spec:trace Related to the specification/trace directory label Jun 12, 2020
@carlosalberto carlosalberto added the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Jul 2, 2020
@andrewhsu andrewhsu added the priority:p1 Highest priority level label Jul 17, 2020
@tedsuo
Copy link
Contributor

tedsuo commented Jul 23, 2020

Hi @pauldraper, I've taken a shot at resolving some of this issues raised in this thread and others here (#730), by adding display hints. Please take a look.

@bogdandrutu bogdandrutu added priority:p3 Lowest priority level and removed priority:p1 Highest priority level labels Aug 10, 2020
@tigrannajaryan
Copy link
Member

I suggest to remove release:required-for-ga label.

The "component" approach was already discussed and rejected in the past. The type of the Span can be deduced by the presence of required attributes. It may not be convenient but it is possible. It is also more powerful since it allows to record multiple types simultaneously while a single "type" or "component" does not (what is the type of a Span representing an HTTP call to a database? Is it "http" or "db"?).

It is likely too late for 1.0 to introduce a new way of specifying the Span type that is better than what we have. The are likely better ways but I don't think we have time to introduce, discuss and agree on an approach quickly enough to make it part of 1.0 release.

@carlosalberto
Copy link
Contributor

+1 on making this release:after-ga .

@andrewhsu
Copy link
Member

From the issue triage mtg today, i'm changing the label to release:after-ga since it looks like from the comments this can be punted.

@andrewhsu andrewhsu added release:after-ga Not required before GA release, and not going to work on before GA and removed priority:p3 Lowest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Sep 9, 2020
@pauldraper
Copy link
Author

1. Does anyone use Datadog? Or am I the only user of the largest commercial monitoring platform?

Because I don't see how Otel is going to work with Datadog using the it can intelligently produce an operation and resource ("type" and "name").

2. Does anyone thing these trace names are actually good?

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span

Like what the heck are they....a file, a GRPC operation, a HTTP request, a DB query, something else?

Not at all obvious.

@Oberon00
Copy link
Member

Oberon00 commented May 4, 2021

Like what the heck are they....a file, a GRPC operation, a HTTP request, a DB query, something else?

Not at all obvious.

The details are specified in the semantic conventions: https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/trace/semantic_conventions

@Oberon00
Copy link
Member

Oberon00 commented May 4, 2021

Because I don't see how Otel is going to work with Datadog using the it can intelligently produce an operation and resource ("type" and "name").

All semantic conventions should have a "marker attribute" or at least a set thereof. E.g. a database operation can be identified by having a db.system span attribute, an HTTP span always has http.method, etc. (but see #653)

@Sturgelose
Copy link

Providing some feedback as a current DataDog user (platform to manage traces in my company), Jaeger user (testing locally)and trying to manage a way in my company to standardize not only spans that have defined usecases (http/grpc request, db, SQS, lambda, etc), but also private custom conventions inside of my company.

I agree and understand that db.system or http.method can be used to identify a span "type of action", "component", "type" or however we want to call it. However, as Paul comments, just checking if this tag exists it is not enough.

We might add new tags to the spec or deprecate some. Technology evolves and for sure we will need to add or remove metadata in spans. However, it is not feasible to operate on them if we do not know which spec we are targeting to.

Thus, is where it makes sense to have a component, type or any kind of type identifying not only the type of the span, but also the version of the schema we are mapping it to! This will simplify parsers, make it easier for users to identify which kind of data they have available and also upgrading queries to support new standardized semantic conventions or potential additions in the future.

Internally in my company I'm working to define span schemas for different similar types of spans that map to business logic. These are totally independent from the semantic conventions defined in oTel, but we still have similar challenges. We are trying to adapt and implement the different tags whenever is possible or relevant for that business logic span. However we are iterating on it, and we are versioning them, so we end with payments-v2 or identity-v4 (as an example), and we know which is the expected structure and tags that we will have in each span.

Otherwise, there is no magic way to understand how spans will change in the future, and of course, it makes it really hard for processors to identify them or understand which kind of span we are looking at. The only option, as Paul says is to make a crazy algorithm, that for sure will have issues when doing changes at the schema that will try to identify the type of span.

Maybe it is not a blocker for a ga release, but it is definitely a must how oTel will version the changes in the semantic conventions (which maybe should be defined (or are) as schemas?) to make sure that in 1-2 years (of convention changing) we know of which kind of http/db/queue, etc metadata we are speaking about.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented May 18, 2022

Thus, is where it makes sense to have a component, type or any kind of type identifying not only the type of the span, but also the version of the schema we are mapping it to! This will simplify parsers, make it easier for users to identify which kind of data they have available and also upgrading queries to support new standardized semantic conventions or potential additions in the future.

@Sturgelose The version of the spec the span conforms to is already possible to include. SchemaURL can be included the emitted telemetry. See schemas: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/schemas/overview.md

Schemas are also how the evolution of the conventions is supposed to be handled. See https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#semantic-conventions-stability

Does this address your concerns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

10 participants