-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Links from transactions and spans to multiple spans/transactions/traces #122
Comments
The main questions I think we need to discuss are:
|
SGTM in general A few more questions:
|
I didn't plan to have it in there, but I'm open to arguments.
Not those specifically, unless we intend to do something with them. I do think we need to be more specific about the link types though (more at the end).
My initial thought was to use the first parent observed in the parent field, all others in the links, but that might be a bit too naive. Not too sure on the answer here, depends on whether we want to visualise multi-parent relationships.
Again, not sure, but I think we'll need to figure this out before we can proceed after all. I can't imagine how we would extend our existing visualisation to account for multiple parents which may cross traces. "Links" is too generic/vague a concept to be useful for visualisation in a tree anyway. At most we could use them for creating a list of links under transaction/span details. For some kind of DAG visualisation we would need to know the link type, specifically whether it's a parent or child (i.e. the arc direction). I think perhaps instead of adding support for generic links, we should change this proposal to focus on adding support for multiple parents. |
Metrics are currently not exported; we'll wait for the data model changes to settle, so we can build the translation off the OTLP representation. Not all of the OpenTelemetry model is covered by Elastic APM. In particular, there's currently no support for links or events. We'll add support for events later, and most likely links too (see elastic/apm#122).
This PR introduces an exporter for [Elastic APM](https://www.elastic.co/apm). The exporter works by translating spans and metrics into the ND-JSON format expected by Elastic APM Server, and sending over HTTP. Currently only spans are supported. Code for translating metrics exists, but is not yet wired up to the exporter; we'll do that once the switch over to the new metrics model is done. Not all of the OpenTelemetry model is covered by Elastic APM. In particular, there's currently no support for links or span events. We'll add support for events later, and most likely links too (see elastic/apm#122). **Testing:** Unit tests added for translating resources, spans, and metrics to the Elastic APM model. This has been tested using a mock in-memory Elastic APM Server. Coverage is > 80%. Manually tested, sending to an [Elastic Cloud](https://cloud.elastic.co/) deployment. **Documentation:** Added a README, which describes the exporter's config. Metrics are currently not exported; we'll wait for the data model changes to settle, so we can build the translation off the OTLP representation.
This PR introduces an exporter for [Elastic APM](https://www.elastic.co/apm). The exporter works by translating spans and metrics into the ND-JSON format expected by Elastic APM Server, and sending over HTTP. Currently only spans are supported. Code for translating metrics exists, but is not yet wired up to the exporter; we'll do that once the switch over to the new metrics model is done. Not all of the OpenTelemetry model is covered by Elastic APM. In particular, there's currently no support for links or span events. We'll add support for events later, and most likely links too (see elastic/apm#122). **Testing:** Unit tests added for translating resources, spans, and metrics to the Elastic APM model. This has been tested using a mock in-memory Elastic APM Server. Coverage is > 80%. Manually tested, sending to an [Elastic Cloud](https://cloud.elastic.co/) deployment. **Documentation:** Added a README, which describes the exporter's config. Metrics are currently not exported; we'll wait for the data model changes to settle, so we can build the translation off the OTLP representation.
Lack of that feature was a blocker for us to use Elastic APM. We have microservices performing data processing pipeline with a scatter & gather (fork & join) steps, so we need spans which can be a part of multiple different traces. |
Question 1: Are there use cases where we expect agents to fill in |
I think message queue instrumentation is one case where we would do this. e.g. receiving a message within a transaction would link said transaction to the span that published the message to the queue. @eyalkoren may have more to say on this.
This question is unresolved, which is why this issue hasn't progressed yet. I would expect the links to show up in the UI, which is why I would expect them to have their own place in the data model. |
ECS uses |
Indeed, for example when using a scheduled task (for which we create a transaction) that reads a message (or a bulk of messages) from a queue; or a send-and-reply scenario where the reply-receiving span has a parent and may be linked to the reply sender span in addition. |
I didn't. I wanted to keep up with OpenTelemetry specification which uses links which are probably functionally similar. I don't want to depend on ElasticAPM-specific implementation. Using OpenTelemetry gives me an option to switch between multiple "backends" for tracing. |
Here is an example use case in Ruby with the background job processing library, |
I have similiar requirement where multiple inputs are batched and processed in one operation (multi-parent tracing). Is there any ETA for same? @mitoihs what backend did you use finally to support this use-case? |
Yeah +1 to that. We have a similar batching requirement where we'd like to trace which batch they we're indexed into. At the moment we have a pre-amble process that iterates the events that are part of the batch and begins and ends a transaction for them each before we batch it, but as you can imagine there are a lot of things wrong with this approach. |
@nikhilbhaware007 when I wrote that comment, I was scanning through available solutions to choose something. We don't yet use anything but will use OpenTelemetry as a... well, not exactly backend but "protocol"? We'll store it in Elasticsearch probably and use a custom frontend to display our traces. Currently, only Jaeger (among few solutions I've checked) has a limited support for displaying such multiparented traces and it's not good enough for us. |
This commit adds instrumentation for Azure Service Bus when an application is using Microsoft.Azure.ServiceBus 3.0.0+ or Azure.Messaging.ServiceBus 7.0.0+ nuget packages. Two IDiagnosticListener implementations, one for Microsoft.Azure.ServiceBus and another for Azure.Messaging.ServiceBus, create transactions and spans for received and sent messages: A new transaction is created when - one or more messages are received from a queue or topic subscription. - a message is receive deferred from a queue or topic subscription. A new span is created when there is a current transaction, and when - one or more messages are sent to a queue or topic. - one or more messages are scheduled to a queue or a topic. The diagnostic events do not expose details about sent or received messages. The trace ids of messages are exposed but are not currently captured in this implementation. Messages are often received in batches, and it is possible for each message to have its own trace id, but the APM implementation does not have a concept for capturing such data right now. See elastic/apm#122 A terraform template file is used to create a resource group, Azure Service Bus namespace resource in the resource group, and set RBAC rules to allow the Service Principal that issues the creation access to the resources. The Service Principal credentials can are sourced from a .credentials.json file in the root of the repository for CI, and from an account authenticated with az for local development. A default location is set within the template, but all variables can be passed using standard Terraform input variable conventions. Closes #1157
I have a use case for this with in Fleet Server's APM instrumentation. We have a bulk process that will batch search and indexing requests from multiple incoming HTTP requests from Elastic Agents into a single |
@joshdover you (or whomever will implement that) may want to subscribe to elastic/apm-agent-go#1243. Support exists in APM Server and Kibana, we're just lacking an API to add links in the Go agent. |
Closing as duplicate of #594 |
Currently it is possible to define only one relationship between transactions/spans: a single parent. This covers the most common patterns (namely request/response), but it is not currently possible to trace others, such as:
Additionally, as described in the OpenTelemetry spec, there may be scenarios where a trace must be restarted (i.e. creating a new trace root), and in such cases the restarted trace could be linked to the originating trace.
Proposed changes
The first step is to extend the transaction and span model such that they can be linked to multiple other transactions or spans. Errors would continue to accommodate only a single parent transaction or span.
Intake API
I propose we add the following optional property to the intake schema:
span
andtransaction
links
(I'm also partial torefs
andreferences
, maybe even<your proposal>
)array
, with items having the following type:Note that the
trace_id
field is optional. If it is empty, then the span or transaction's own trace_id is assumed.ES Mapping
We have two main options here: store as nested docs, or store as an array of objects.
Using nested means that for every link, there will be an additional document in ES, which could introduce performance issues. I don't think it's a good idea to go down this road.
Using an array of objects for the links means that we cannot search on both trace ID and span/transaction ID and have them match only links that have both fields that match. We could deal with this in one of two ways:
"links": ["trace_id:span_id", "trace_id:span_id"]
The types of searches we're likely to do are "find all spans linked to span X in trace Y" within the configured time-frame. I expect it is highly unlikely that we would ever find a repeated span ID AND have the same trace ID involved. So either approach is probably fine, structured is generally easier to deal with.
UI
Needs input from @elastic/apm-ui and design as to how we do it, but we should render the links in the UI, perhaps as a list in the transaction details and span details flyout. We can defer discussing the specifics, so long as we can come up with an ES mapping that is flexible enough.
The text was updated successfully, but these errors were encountered: