-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Span name: Both low-cardinality (grouping key) and human-readable (display name) #557
Comments
|
|
|
I forgot to mention this in the SIG meeting today, but this topic was talked about there, so it seems this is an actual issue. |
Can you define what a grouping key is? I'm not familiar with this term. |
@jkwatson Thank you for that question, I think I accidentally coined that term here. I edited the description to hopefully be more clear. The "grouping key" requirement can be found in the spec as a requirement for low cardinality of the span name. I think the intention here was to allow grouping / sampling per span name. |
@jkwatson You can think of the proposed grouping key as what an SQL query will look at in a "GROUP BY" clause. All spans with the same grouping key will be grouped together on the backend when being analyzed or presented in the UI. |
Thanks for the answers, that helps a lot! |
@Oberon00 can you elaborate? I am not seeing an issue yet, especially since "grouping key" is a concept for machine processing, and why would machines care if they group by extra field or by readable span name (providing there's uniqueness). |
Exactly! That's the idea behind my suggestions in this issue.
You can see the issue in the PR comment linked in the description (#548 (comment); actually it was your comment 😄). The issue is that if we just want the span name to be a display name, we can drop the requirement for low cardinality and would potentially have a better human-readability (e.g. we could use GET domain.example.com as display name instead of just HTTP GET), or we could drop the requirement for (easy) human readability and would have a better grouping key. The span name needs to satisfy both requirements which obviously means there must be compromises made in some cases. |
Sorry, but I don't see "low cardinality" and "human readable" as conflicting requirements. In Jaeger UI, for example, you can select the Service from a dropdown, and the following dropdown is populated by span names - they have to be both human readable and low cardinality (high cardinality will explode the dropdown and make it unusable). |
"HTTP GET" is a very poor display name (it's also a poor grouping key). "GET domain.example.com" is a good display name, but it's an even worse grouping key. |
If you want to use your groups for display (as opposed to sampling), you'll need a display name for your group. In that case you can use any one display name from within that group and display something like "'GET domain.example.com' and 32323 similar spans" |
To summarize, you dislike the use of span name being both (1) the name for the individual span "instance" and (2) the (low-cardinally) name of the span "class". I agree. If I'm looking at a single trace, I'd prefer to see
instead of
The former is a more informative summary of the span. Of course the latter form still needed for grouping (e.g. Jaeger dropdown UI). Counterargument: Span attributes are the place to store instance-specific details. Counter-counterargument: There are more attributes than can be displayed for all spans in a trace, and there are more span attributes than every backend can be required to understand significant vs. insignificant. If you want something looks like the former trace, you need the instrumentation to name it. For compatibility I propose the opposite: the span "name" should continue to be the low-carnality span "class." And there should be a standardized attribute |
I think having an optional |
My same feeling. I like the idea of having an additional, optional attribute (such as |
Agreed. No reason to make it mandatory: there is an abundently obvious fallback in span name. To add a further example for this idea: Since HTTP clients don't have to pattern-match URLs, they frequently are left without a great low-cardinality names, and the instrumentation is forced to choose something poor like Bike shed: The thing I don't like about |
So back to the initial description: Will this display.nane allow me to merge #548 then? |
Why "GET domain.example.com" is a bad grouping key? |
My $0.02 was given verbally in the last SIG meeting. I'd like to see a span name with message-formatting syntax, like "GET {http.url}". This is both descriptive and low cardinality, and when it comes time to display, the system can format the message. |
If you care for one more opinion: span name is certainly a "grouping key" for me. A value that define a "class" of spans, as specification states. It is certainly has less information that possible display name of a single span. Every UI vendor can may his own decisions (probably even different ones for different users of that UI) what specific attribute of the span to use in their UI. Span name defines a unit of work, not a particular invocation of that work. Taking that into account I think the original proposal of #548 is a good one and we should go with it. |
Span name is definitely a grouping key.
The question is whether backends or instrumentations are in a better
position to display a good name/summary for an individual span.
I believe instrumentations should do choose attributes for a display name;
you believe that backends should understand attributes and generate a
display name.
…On Mon, May 11, 2020, 13:03 Nikita Salnikov-Tarnovski < ***@***.***> wrote:
If you care for one more opinion: span name is certainly a "grouping key"
for me. A value that define a "class" of spans, as specification states. It
is certainly has less information that possible display name of a single
span. Every UI vendor can may his own decisions (probably even different
ones for different users of that UI) what specific attribute of the span to
use in their UI. Span name defines a unit of work, not a particular
invocation of that work.
Taking that into account I think the original proposal of #548
<#548>
is a good one and we should go with it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#557 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKWTB4XOD7AA3XEKTDDTYLRRBDYNANCNFSM4MER6NXQ>
.
|
Correct. Backends have to understand attributes anyway if they hope to present useful information to their users. "Useful" as in "relevant to users' use-cases". You don't make decisions about UI in sql queries. |
Because it can have a very high cardinality. See #270 (comment) EDIT: And https://github.com/open-telemetry/opentelemetry-specification/pull/416/files#r369591004 |
Ah, I thought you were talking about "GET ". Using full URL is certainly bad for grouping. Even using path portion is very bad, agreed. |
For the record, I have since changed my opinion and think that http route information, if available, makes for a better span name than handler name. My current opinion is based on the following example. Imagine we have a servlet container running Spring MVC application and having auto instrumentation agent attached. When we create SERVER span from servlet request we have full url, which is too high cardinality for span name, and servlet handler name, like In the situation above I think it is sensible to update SERVER span name to route-based name |
Hi all, I've made a display hint proposal to help resolve this, please have a look: #730 |
I want to address this argument. For server-side span, there is no excuse in using "HTTP GET" as a span name. A server always has some kind of mapping from URLs to handlers, that mapping most commonly called "route". The route is good both as display string and grouping key, satisfying the span name requirements. The situation where "HTTP GET" names become necessary are for the client-side spans emitted by generic instrumentation of HTTP frameworks, which are removed from business logic and only have the (potentially high-cardinality) URL to work with. It is often not that interesting to build aggregations of those spans, so while certainly a poor grouping key those names do not affect the analysis that much. Without aggregation requirement, these spans are only useful to be shown in the Gantt-chart style views, where we're dealing with a single span and the UI can replace/augment the span name with the full URL. |
I think that depends on how you choose to do it. IMO there isn't a fundamental difference between client tracing and server tracing:
What is instrumented by whom depends. If you are using a SOAP client, there an obvious automatic name for the operation. If you are using ApacheHttpClient you will have to choose and add the name yourself. If you are using a Play server, there is an obvious automatic name for the operation. If you are using com.sun.net.httpserver.HttpServer you will have to choose and add the name yourself. |
@open-telemetry/dotnet-approvers are you aware of this issue? It seems you already made a choice here that runs counter the current direction of this discussion when naming the Name DisplayName (see #730 (comment)) |
I suggest to move this issue for Future milestone. This may be added after GA, not a breaking change. |
The issue here is that the purpose of the existing span name is not clearly defined. Sure, you can add a |
See spec
the issue is that optimizing a span name for human-readability will sometimes yield different results from optimizing for "most general string", as demonstrated in #548. In that PR, I agree that when "naming" an HTTP span, the route name is better for human consumption, but I maintain that the handler name is a more general string and still identifies a statistically interesting class of spans. |
Outside of the spec, the span name is generally referred to as the 'grouping key' for operations and instrumenters are advised to avoid high cardinality span names. I agree with your (@Oberon00) overall point, but I would suggest that #548 could just as simply be resolved by turning the handler class/method name into a semantic attribute on a server span. +1 to Sergey's point about tabling this until post-GA. |
@Oberon00 this problem exists on many platforms already and every platform takes its route to resolve it. OpenTelemetry semantic convention made emphasize on (more) guaranteed grouping. Every app owner/vendor may adjust the span name with custom modules or custom code. Also our requirement is to have a support for features in some backend. This is not a simple attribute and additional semantic is forced on it. So it would be great to have Jaeger or other backend to demonstrate how to use it. |
I proposed #810 to clarify what I think we should resolve before GA. |
…etry#810) * Clarify span name criteria. * Clarify that human-readility is still something we want.
This came up when I suggested the handler name instead of the route name for the HTTP semantic conventions (#548). I think that the handler name is a better grouping key than the route, but the route is a better display name, as @yurishkuro pointed out. So I think we should relieve span name from this double duty. I'll post ideas as separate comments to allow reactions.
EDIT:
Cf. current spec of span name https://github.com/open-telemetry/opentelemetry-specification/blob/e7fe34ad/specification/trace/api.md#span:
I used the term "grouping key" here because I think using the span name as a key to group / sample similar spans is the reason why a Span name should have a low cardinality.
The text was updated successfully, but these errors were encountered: