Simple span processor perf #502

jtescher · 2021-03-29T16:34:11Z

Currently the simple span processor his highly sensitive to any latency in its underlying exporter as it will block the current thread for the full export duration when each span ends.

This patch addresses this by moving span exporting to a separate thread, and communicating via channels.

Spans are still exported as soon as they end, and shutdown will wait for all spans to be successfully exported, preserving the simple span processor semantics of immediate exports, and never dropping data.

Currently the simple span processor his _highly_ sensitive to any latency in its underlying exporter as it will block the current thread for the full export duration when each span ends. This patch addresses this by moving span exporting to a separate thread, and communicating via channels. Spans are still exported as soon as they end, and shutdown will wait for all spans to be successfully exported, preserving the simple span processor semantics of immediate exports, and never dropping data.

frigus02 · 2021-03-29T17:58:21Z

The implementation looks good.

I wonder if this is a good idea, though.

At the moment the simple span processor is highly sensitive to any latency in its underlying exporter. But it is easily predictable.
If I have a program using the simple span processor, where the program logic is faster than the exporter, then with the current implementation my program will run slower. But it should run just fine. With the change it can possibly build up a very large backlog of spans, which consumes memory. The bigger the difference between span exporter latency and business logic speed the smaller the overall benefit of making the requests in another thread.

That said, I'm really not sure either way. There are certainly benefits in exporting in a different thread. I wouldn't object to merging this.

jtescher · 2021-03-29T18:16:52Z

Yeah I'm open to alternatives as well, I think there are always going to be drawbacks to the simple strategy, the benefit in this approach is that it should be better for cases where the simple strategy makes sense (sending one-by-one, never dropping spans). If you are producing faster than you can export, you probably should be using the batch span processor to do load shedding.

The main alternative seems to be to close this and keep the existing behavior which avoids the OOM by simply blocking the progress of the current thread, but it seems to conflict a bit with open-telemetry/opentelemetry-specification#1555, which adds a non-blocking requirement to ending spans.

frigus02 · 2021-03-29T18:44:56Z

I don't know of good alternatives. And you're right. The spec suggests that we shouldn't block, which the simple span processor currently does.

opentelemetry/src/sdk/trace/span_processor.rs

opentelemetry/Cargo.toml

TommyCpp · 2021-03-29T21:55:10Z

opentelemetry/src/sdk/trace/span_processor.rs

+            .name("opentelemetry-exporter".to_string())
+            .spawn(move || {
+                while let Ok(Some(span)) = span_rx.recv() {
+                    if let Err(err) = executor::block_on(exporter.export(vec![span])) {


Should we consider adding a timeout around here to prevent the exporter block for too long?

I was considering that, but the spec doesn't have any config for it and if you are creating faster than you are exporting to the point where it would be a big issue you should probably just use the batch processor as your app may OOM or do other bad things as well with the simple processor.

To that point we should probably warn users that using simple span processor may cause OOM.

Yeah I think a docs PR followup on the difference between both and when to choose one vs the other would be a good idea 👍

TommyCpp · 2021-03-29T21:56:22Z

Overall LGTM. I think we may also consider supporting user without async runtime to use batch span processor somehow, may be helpful as I think batch span processor generally have a better performance

ForceFlush seems to have been left behind in open-telemetry#502. With those changes, the processing is not really synchronous anymore, i.e. OnEnd now only sends the span down the pipe to be processed in the separate thread as soon as possible. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#forceflush-1 says: > In particular, if any SpanProcessor has any associated exporter, it SHOULD try to call the exporter's Export with all spans for which this was not already done and then invoke ForceFlush on it. As the comment states, all spans previously got exported synchronounsly right away, so that no such spans existed, but now they might be anywhere between the channel and (the end of) the export call. Doin g nothing in ForceFlush even violates the specification as... > The built-in SpanProcessors MUST do so. Awaiting all open tasks from the channel on ForceFlush fixes this. Previous discussions regarding parts of the specification that this does not tackle in line with Shutdown: > ForceFlush SHOULD provide a way to let the caller know whether it succeeded, failed or timed out. open-telemetry#358 (comment) > ForceFlush SHOULD complete or abort within some timeout. https://github.com/open-telemetry/opentelemetry-rust/pull/502/files#r603722431 This brings the simple processor a step closer to the batch processor with the obvious main difference of batches and the (not so obvious, also see open-telemetry#502 (comment)) difference that it works without a presumed async runtime.

ForceFlush seems to have been left behind in #502. With those changes, the processing is not really synchronous anymore, i.e. OnEnd now only sends the span down the pipe to be processed in the separate thread as soon as possible. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#forceflush-1 says: > In particular, if any SpanProcessor has any associated exporter, it SHOULD try to call the exporter's Export with all spans for which this was not already done and then invoke ForceFlush on it. As the comment states, all spans previously got exported synchronounsly right away, so that no such spans existed, but now they might be anywhere between the channel and (the end of) the export call. Doin g nothing in ForceFlush even violates the specification as... > The built-in SpanProcessors MUST do so. Awaiting all open tasks from the channel on ForceFlush fixes this. Previous discussions regarding parts of the specification that this does not tackle in line with Shutdown: > ForceFlush SHOULD provide a way to let the caller know whether it succeeded, failed or timed out. #358 (comment) > ForceFlush SHOULD complete or abort within some timeout. https://github.com/open-telemetry/opentelemetry-rust/pull/502/files#r603722431 This brings the simple processor a step closer to the batch processor with the obvious main difference of batches and the (not so obvious, also see #502 (comment)) difference that it works without a presumed async runtime.

jtescher requested a review from a team March 29, 2021 16:34

Clippy lints

60c46ee

jtescher mentioned this pull request Mar 29, 2021

Usage of block_on #498

Closed

frigus02 approved these changes Mar 29, 2021

View reviewed changes

opentelemetry/src/sdk/trace/span_processor.rs Show resolved Hide resolved

opentelemetry/Cargo.toml Show resolved Hide resolved

TommyCpp reviewed Mar 29, 2021

View reviewed changes

TommyCpp approved these changes Mar 30, 2021

View reviewed changes

Merge branch 'main' into simple-span-processor-perf

b3fea5f

jtescher merged commit 4af37e1 into main Mar 31, 2021

jtescher deleted the simple-span-processor-perf branch March 31, 2021 05:14

cschramm mentioned this pull request Apr 17, 2023

Wait for exports on the simple span processor's ForceFlush #1030

Merged

TommyCpp mentioned this pull request Oct 23, 2023

Flush fix for SimpleLogProcessor #1308

Merged

cijothomas mentioned this pull request Nov 28, 2023

[Feature]: SimpleSpanProcessor to be consistent with SimpleLogRecordProcessor #1409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple span processor perf #502

Simple span processor perf #502

jtescher commented Mar 29, 2021

frigus02 commented Mar 29, 2021

jtescher commented Mar 29, 2021

frigus02 commented Mar 29, 2021

TommyCpp Mar 29, 2021

jtescher Mar 30, 2021

TommyCpp Mar 30, 2021

jtescher Mar 30, 2021

TommyCpp commented Mar 29, 2021

Simple span processor perf #502

Simple span processor perf #502

Conversation

jtescher commented Mar 29, 2021

frigus02 commented Mar 29, 2021

jtescher commented Mar 29, 2021

frigus02 commented Mar 29, 2021

TommyCpp Mar 29, 2021

Choose a reason for hiding this comment

jtescher Mar 30, 2021

Choose a reason for hiding this comment

TommyCpp Mar 30, 2021

Choose a reason for hiding this comment

jtescher Mar 30, 2021

Choose a reason for hiding this comment

TommyCpp commented Mar 29, 2021