[META 605] Add span links for message tracing #2489

felixbarny · 2022-03-02T12:39:42Z

Meta issue:

Add span links for message tracing apm#605

Spec issue:

[META 605] Spec: Add span links for message tracing apm#606

Summary of spec changes:

If your agent instruments any of Azure Service Bus, JMS, Kafka, RabbitMQ, SQS, or SNS, then instrumentation for receiving messages or batches of messages "SHOULD":
- Create a transaction (with trace-context from message metadata, if any) for the handling of each message, if possible.
- Otherwise, when creating the transaction or span for the message or batch of messages, look for trace-context in metadata for up to 1000 messages and create a span link for each.
If your agent supports AWS Lambda instrumentation, then in addition to the creation of span links, this spec change has a number of changes to the context captured for SQS and SNS-triggered invocations. (The special-casing of a message batch with a single message has been dropped.)

eyalkoren · 2022-05-01T11:07:32Z

Consider adding instrumentation for Spring Kafka batch listener (#2601) and use it as a test case for span links

eyalkoren · 2022-05-01T13:29:09Z

So far, of all supported messaging system clients, only the Kafka plugin had to deal with batch processing. In this plugin, we wrap the batch iteration, allowing us to create a transaction-per-record.
Hopefully, we can apply the same approach for SNS and SQS clients without the need for special-casing single-message-batches.
In addition, Lambda functions triggered by SNS/SQS rely on events that provide lists of records.

Therefore, as far as I can see, the scope of this spec implementation for the Java agent is as follows:

Polling spans: if the instrumented polling API returns a message with a tracecontext header - add a link to the sender span
Record-batch iteration (Kafka plugin): we already instrument org.apache.kafka.clients.consumer.KafkaConsumer#poll, which returns org.apache.kafka.clients.consumer.ConsumerRecords that we instrument as well to wrap the iteration. We will need to apply different iteration-wrapper strategy for each ConsumerRecords instance, based on the state at OnMetodExit of the KafkaConsumer#poll instrumentation - if there is an active span, the wrapper will only add a link per record with tracecontext to the active span, otherwise- we will use the same wrapper we use now to create transaction per record.
SNS/SQS-triggered Lambda functions- add span links to the single function transaction. We can either iterate over the list for that, or wrap the list with our own, so that we add the links while the application is iterating. The latter is a bit more efficient, but way more complex and would not work if we fail to activate the transaction on the iterating thread.

eyalkoren · 2022-05-02T12:47:57Z

Correction: we also have this one to support similarly to Kafka:

SpringAmqpBatchMessageListenerInstrumentation

trentm · 2022-05-02T22:13:10Z

We can either iterate over the list for that, or wrap the list with our own, so that we add the links while the application is iterating.

I'm not sure how strict this rule is, but the OTel spec on span links says that Links cannot be added after Span creation. My understanding of their reasoning for that is that all span links are passed into the OTel SDK's "Sampling" API so that the registered Sampling implementation can use those links in its processing. I'm not sure if any of this applies to your "we add the links while the application is iterating".

eyalkoren · 2022-05-03T04:06:01Z

Thanks for pointing out!
I am not even sure there is a meaningful advantage for us in doing that through iteration vs. at batch processing start.
Either way, we discussed it yesterday and decided there are two approaches we would apply:

create an encapsulation batch processing span/transaction, where we can typically add links at creation time, as in Lambda triggered by SNS/SQS, or if we instrument a batch processing API like org.springframework.amqp.core.MessageListener#onMessageBatch(List)
create a span-per-message if our instrumentation already supports batch-iteration instrumentation with a single span link (added at creation time), if there is an active transaction when the iteration starts. For example, in Add support of kafka batch consumer #2601, a user asked for inherent instrumentation for a Spring batch processing wrapper around Kafka's polling client API. If we add such instrumentation, we can do as described in 1. However, if we don't add such and the user adds a manual encapsulating batch-processing transaction, as described in the issue, then we will create a span per Kafka record with a link to the sending span.

So I think we are OK with this guidance.

eyalkoren · 2022-05-04T08:10:09Z

@trentm the said above is true for transactions. However, adding span links to message-polling spans cannot typically follow this guidance, as by nature, polling APIs return the received message only at exit time. Assuming that we want polling span to measure the polling duration, it means that we create and start the span on polling method entry and end it on exit, which is only when we can look for the sender's tracecontext.
I am doing exactly this now. If you think that is an issue, let me know and we'll see what we can do about that.

trentm · 2022-05-04T16:22:38Z

@eyalkoren https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md#batch-receiving shows an example from OTel where there are separate spans for "message receive" and "message processing". In their example they add a span link on the "message processing" span. I don't know if that is relevant or useful for the case you describe. (I vaguely recall some discussion about how to model spans for messaging systems back when Russ was adding Azure support, but I wasn't following very closely at the time.)

If you think that is an issue

I suspect that it is probably fine to add span links late (after span creation) for our agents, if necessary. open-telemetry/opentelemetry-specification#454 is an OTel Spec discussion on potentially (re-)allowing this.

eyalkoren · 2022-05-08T12:25:53Z

This OTel example is about a poll action that takes place when there is no active context. The guidance is then to create an orphan span for the receive action and a child span per message with a single span link. I don't mind doing that once our UI properly renders distributed traces based on links. But as it is now, if I change our tracing to do that, it will be an unfortunate breaking change for our existing users, as it will break their currently properly-presented full traces, because there will be no parent-child relationships to represent the distributed trace.
In the Java agent we actually have a distinction between receive transactions and message-handling transaction. The former type is discarded by default.

What I referred to is a polling span, i.e.- a polling action traced when there is an active context (transaction/span). As opposed to the former scenario, in this one there is no incentive to do all kind of sophisticated and risky assumptions to trace message handling as it will be automatically traced as child actions of the active transaction/span. So in this scenario, I would say a polling span is enough to represent the involvement of a messaging system and with the ability to link to the sending span, that's even better covered.

felixbarny added the 8.3-candidate label Mar 2, 2022

github-actions bot added the agent-java label Mar 2, 2022

felixbarny mentioned this issue Mar 2, 2022

Add span links for message tracing elastic/apm#605

Open

10 tasks

AlexanderWert added this to the 8.3 milestone Mar 28, 2022

AlexanderWert added the size:medium Medium (M) tasks label Mar 28, 2022

AlexanderWert assigned eyalkoren Mar 28, 2022

AlexanderWert removed the 8.3-candidate label Mar 28, 2022

eyalkoren mentioned this issue May 1, 2022

Add support of kafka batch consumer #2601

Closed

eyalkoren mentioned this issue May 3, 2022

Add span links to messaging system tracing #2610

Merged

8 tasks

AlexanderWert modified the milestones: 8.3, 8.4 May 30, 2022

eyalkoren closed this as completed in #2610 Jun 6, 2022

pyohannes mentioned this issue Feb 7, 2023

Allow adding links after span creation open-telemetry/opentelemetry-specification#3186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[META 605] Add span links for message tracing #2489

[META 605] Add span links for message tracing #2489

felixbarny commented Mar 2, 2022 •

edited by AlexanderWert

Loading

eyalkoren commented May 1, 2022

eyalkoren commented May 1, 2022 •

edited

Loading

eyalkoren commented May 2, 2022 •

edited

Loading

trentm commented May 2, 2022

eyalkoren commented May 3, 2022

eyalkoren commented May 4, 2022

trentm commented May 4, 2022

eyalkoren commented May 8, 2022 •

edited

Loading

[META 605] Add span links for message tracing #2489

[META 605] Add span links for message tracing #2489

Comments

felixbarny commented Mar 2, 2022 • edited by AlexanderWert Loading

eyalkoren commented May 1, 2022

eyalkoren commented May 1, 2022 • edited Loading

eyalkoren commented May 2, 2022 • edited Loading

trentm commented May 2, 2022

eyalkoren commented May 3, 2022

eyalkoren commented May 4, 2022

trentm commented May 4, 2022

eyalkoren commented May 8, 2022 • edited Loading

felixbarny commented Mar 2, 2022 •

edited by AlexanderWert

Loading

eyalkoren commented May 1, 2022 •

edited

Loading

eyalkoren commented May 2, 2022 •

edited

Loading

eyalkoren commented May 8, 2022 •

edited

Loading