Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processor/otel: index span events as logs #6122

Merged
merged 1 commit into from
Sep 6, 2021
Merged

Conversation

axw
Copy link
Member

@axw axw commented Sep 4, 2021

Motivation/summary

Produce log events for non-exception OpenTelemetry span events, capturing the span event name as the log message, and all other attributes as labels. We only index logs when data streams are enabled; they are dropped when classic indices are in use.

Checklist

How to test these changes

  1. Instrument an application with either an OpenTelemetry SDK or Jaeger, capturing a span and span logs/events.
  2. Run APM Server with data streams enabled, check that logs are indexed into logs-apm.app-<namespace> and show up in the "Logs" panel for a trace sample
  3. Run APM Server with data streams disabled, check that logs are not indexed at all

Related issues

Closes #4715
Closes #3338

@axw axw force-pushed the otel-span-logs branch 2 times, most recently from a70b1c0 to 2271a7c Compare September 4, 2021 02:22
@apmmachine
Copy link
Contributor

apmmachine commented Sep 4, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-09-06T04:38:33.462+0000

  • Duration: 42 min 42 sec

  • Commit: 2018f28

Test stats 🧪

Test Results
Failed 0
Passed 5900
Skipped 14
Total 5914

Trends 🧪

Image of Build Times

Image of Tests

@axw axw force-pushed the otel-span-logs branch 4 times, most recently from e4c145d to 77b0554 Compare September 6, 2021 02:37
Produce log events for non-exception span events,
capturing the span event name as the log message,
and all other attributes as labels.

We only index logs when data streams are enabled;
they are dropped when classic indices are in use.
@axw axw marked this pull request as ready for review September 6, 2021 06:06
@axw axw requested a review from a team September 6, 2021 06:06
@axw
Copy link
Member Author

axw commented Sep 6, 2021

@bmorelli25 @cyrille-leclerc FYI I've made a few minor edits to the Jaeger and OpenTelemetry docs, removing the caveats related to span events. There were also some outdated caveats listed in the docs related to metrics which I've removed as well.

Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@axw axw merged commit 6c9bfad into elastic:master Sep 6, 2021
@axw axw deleted the otel-span-logs branch September 6, 2021 12:33
mergify bot pushed a commit that referenced this pull request Sep 6, 2021
Produce log events for non-exception span events,
capturing the span event name as the log message,
and all other attributes as labels.

We only index logs when data streams are enabled;
they are dropped when classic indices are in use.

(cherry picked from commit 6c9bfad)

# Conflicts:
#	changelogs/head.asciidoc
@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented Sep 6, 2021

I love this step forward and I would like to review the naming conventions for the different datastreams.

For an application (service.namespace: ecommerce, service.name: frontend, deployment.environment: production) what are the data streams for transactions, spans, transaction events, span events, logs, metrics?

Your code (here) let me think you propose the datastream logs-apm.app-custom for span events.

I had a different idea in mind. My thoughts are:

  • We have to be consistent with what APM will do for transactions, spans, and metrics and with application logs,
  • According to "Datastream naming scheme":
    • The namespace be deployment.environment like "production" or "staging" as our docs say "A user-configurable arbitrary grouping, such as an environment (dev, prod, or qa), a team, or a strategic business unit..." rather than the proposed custom
    • The dataset should contain the service.name, prefixed by the service.namespace if defined, which are the equivalent to nginx in logs-nginx.access-production as our docs say "describes the ingested data and its structure for each index". I see this as an alternative to the app keyword in the proposed logs-apm.app-custom.
    • I'm not sure if .apm is the best suffix to "describe the structure of the index" as "apm" is a category of tool more than a type of data. I was thinking of the generic type "traceevent" if we restrict this index to span events ("trace" being a common denominator between transaction events and span events) or more broadly to "transaction" and "span" if we want to locate the transaction/span documents with the transaction.event / span.event in the same datastreams.

@axw
Copy link
Member Author

axw commented Sep 6, 2021

Should the namespace be deployment.environment like rpdocution", "staging" as our docs say "A user-configurable arbitrary grouping, such as an environment (dev, prod, or qa), a team, or a strategic business unit..." rather than the proposed custom

The "custom" you see is just a unit test. The namespace is user-defined when they create their agent policy in Fleet, and is the same for all data streams produced by the APM server corresponding to that policy.

The team has discussed in the past the idea of permitting templates in the namespace, e.g. set it to {{service.environment}} so it will end up being set to the same value as the service environment. This is not currently possible.

Should the dataset contain the service.name, maybe as an alternative to the proposed app of apm.app
"describes the ingested data and its structure for each index"

I don't think it's necessary. The index has the same structure regardless of the application. This is somewhat aspirational: OTel attributes are dynamically mapped as labels.*, which means there could be a mapping conflict if two applications map an attribute as different types. We plan to address this in two ways:

  • we will split numeric labels off into a new numeric_labels field
  • we will switch these both over to flattened, to avoid mapping explosions

See also #3873 (comment)

be prefixed by apm. as you propose? apm. refer to a category of tools more than referring to the structure of ingested data

Perhaps we don't need the common prefix. One thing we would need to check is whether the UI has permissions to read all logs-* data streams.

Maybe I could rephrase saying, for an application (service.namespace: ecommerce, service.name: frontend, deployment.environment: production) what are the data streams for transactions, spans, transaction events, span events, logs, metrics?

  • traces-apm.app-<namespace> (transactions and spans)
  • logs-apm.error-<namespace> (errors)
  • logs-apm.app-<namespace> (span events/logs)
  • metrics-apm.app.<service.name>-<namespace> (application metrics)
  • metrics-apm.internal-<namespace> (APM-internal metrics)

The reason for having the service name in the application metrics data stream is because in that case only, there are going to be dynamically mapped fields which

  • may lead to mapping collisions
  • will differ considerably between applications

axw added a commit that referenced this pull request Sep 6, 2021
* processor/otel: index span events as logs (#6122)

Produce log events for non-exception span events,
capturing the span event name as the log message,
and all other attributes as labels.

We only index logs when data streams are enabled;
they are dropped when classic indices are in use.

(cherry picked from commit 6c9bfad)

# Conflicts:
#	changelogs/head.asciidoc

* Delete head.asciidoc

Co-authored-by: Andrew Wilkins <axw@elastic.co>
@marclop marclop added backport-skip Skip notification from the automated backport with mergify test-plan labels Oct 25, 2021
@stuartnelson3
Copy link
Contributor

Confirmed that logs are indexed with datastreams enabled, and not indexed with datastreams disabled. The logs didn't show up in the "logs" tab, though.

@stuartnelson3
Copy link
Contributor

issue: #6544

@stuartnelson3
Copy link
Contributor

Confirmed with BC3

@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented Nov 22, 2021

@axw could we please have a screenshot with a somewhat realistic example of span event composed of a name and a bunch of attributes? cc @amena-siddiqi

If you can

  • confirm me that span events appear in the trace/logs tab as lines of "logs" with a message that is composed of the event name and of its attributes
  • tell me what is the formatting of the "Message" column, how we format the attributes, then I'll be able to craft a screenshot

image

@axw
Copy link
Member Author

axw commented Nov 23, 2021

confirm me that span events appear in the trace/logs tab as lines of "logs" with a message that is composed of the event name and of its attributes

Span events will appear in the trace logs tab.

tell me what is the formatting of the "Message" column, how we format the attributes, then I'll be able to craft a screenshot

AFAIK, the message column is simply taken from the value of the message field. The message field is set to the span event name:

event.Message = spanEvent.Name()

@cyrille-leclerc
Copy link
Contributor

Many thanks @axw .

  • Can you confirm that the OTel resource attributes like service.nameor service.version are added as fields of the log message in the Elasticsearch storage?
  • Would it make sense for you that when visualizing a span events, we should display the following data of the span event (see Span Event Specification):
    • Name of the event
    • Timestamp of the event
    • Span event attributes

My understanding is that this visualization of the span events that would be different from the visualization of raw log messages may require a different renderer for log messages and for span events.

Would that make sense?

@axw
Copy link
Member Author

axw commented Nov 23, 2021

Can you confirm that the OTel resource attributes like service.nameor service.version are added as fields of the log message in the Elasticsearch storage?

Those fields will be present on the log document, but not all fields will be. Labels will not be.

Would it make sense for you that when visualizing a span events, we should display the following data of the span event

Message and timestamp 100% belong on the top level, and probably the service name. Attributes will likely get a bit busy. I think they belong in a flyout, which we don't yet have but is tracked in elastic/kibana#111325

My understanding is that this visualization of the span events that would be different from the visualization of raw log messages may require a different renderer for log messages and for span events.

I'm having trouble parsing this. Can you restate, please?

@cyrille-leclerc
Copy link
Contributor

My understanding is that this visualization of the span events that would be different from the visualization of raw log messages may require a different renderer for log messages and for span events.

I'm having trouble parsing this. Can you restate, please?

If we want a special visualization in the logs tab, then we will need to evolve the "logs visualization tab" to have dedicated rendering for "span events" and for "raw log messages".

As you propose for the moment to rely on the flyout, the question of a dedicated renderer is not needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify test-plan test-plan-ok v7.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[OpenTelemetry] Support OpenTelemetry Span Events Process additional Jaeger Span logs
6 participants