Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing and improperly nested span submission with Jaeger exporter #888

Open
shivjm opened this issue Oct 3, 2022 · 8 comments
Open

Missing and improperly nested span submission with Jaeger exporter #888

shivjm opened this issue Oct 3, 2022 · 8 comments
Labels
A-trace Area: issues related to tracing bug Something isn't working release:required-for-stable Must be resolved before GA release, or nice to have before GA.

Comments

@shivjm
Copy link

shivjm commented Oct 3, 2022

As outlined on Matrix, I’ve got a short-lived (periodically executed) application that uses reqwest, tracing-opentelemetry, and opentelemetry-jaeger. I noticed it frequently drops the main span when it exits, leaving the other spans scattered and flattened. I built a test repository that reproduces and explains the issue.

@TommyCpp TommyCpp added bug Something isn't working A-trace Area: issues related to tracing labels Oct 7, 2022
@saschagrunert
Copy link
Contributor

It's not only related to the Jaeger exporter, but also to the OTLP (via GRPC) one.

@TommyCpp
Copy link
Contributor

cc @jtescher as you have more experience in tracing-opentelemetry. I checked the example and can reproduce the issue with the stdout exporter. I suspect we didn't have the right context for the span created by HTTP client but still need to dig more to see what could be the issue

@mkatychev
Copy link
Contributor

mkatychev commented Oct 16, 2022

@TommyCpp I did some digging on this problem and I think these issues are tied together to improper otel context propagation in tracing-opentelemetry. I forked the demo to show how to retain the parent span and log the span IDs.

  1. First issue of main not appearing can be solved by calling .with_context(span.context()) on run:
    image

    I noticed that I'd have to call both with_context and instrument to get tracing IDs to propagate:
    pub use opentelemetry_api::trace::WithContext #893

    This was the issue I initially stumbled upon because instrumentation does not handle on_enter and on_exit for tracing-opentelemetry, after some discussion on the tokio discord I took up a PR that is currently still in progress:
    tracing-opentelemetry: Add opentelemetry::Context propagation tokio-rs/tracing#2345

  2. Second issue is that current context span id is always one behind current span span id:

    Here's some things to notice on my fork of the example code:

    $ cargo run -- --log 'warn,rust_opentelemetry_jaeger_test=trace' --backend surf --jaeger-agent-endpoint 127.0.0.1:6831
    
     INFO Parsed arguments root=Root { url: "http://www.google.com/", jaeger_agent_endpoint: "127.0.0.1:6831", backend: "surf", json: false, log: "warn,rust_opentelemetry_jaeger_test=trace" }
     WARN pre_run span_span_id=9023d793d11614e5
     WARN pre_run cx_span_id=0000000000000000
     WARN main:run: span_span_id=5102f91ba0fbc9ca
     WARN main:run: cx_span_id=9023d793d11614e5
     WARN main:run:make_request_with_surf: make_request span_span_id=94739a4b3762018c
     WARN main:run:make_request_with_surf: make_request cx_span_id=5102f91ba0fbc9ca
     INFO main:run: Finished making request
     INFO Shut down tracer provider
    
  • Removing the instrument property on rust_opentelemetry_jaeger_test ::run produces aligned current span ids but only for one depth:

    - #[tracing::instrument(skip_all)]
    + // #[tracing::instrument(skip_all)]
    async fn run(root: &Root) {
     WARN pre_run span_span_id=fc91e806866aee3e
     WARN pre_run cx_span_id=0000000000000000
     WARN main: span_span_id=fc91e806866aee3e
     WARN main: cx_span_id=fc91e806866aee3e
     WARN main:make_request_with_surf: make_request span_span_id=cace6f002448669a
     WARN main:make_request_with_surf: make_request cx_span_id=fc91e806866aee3e
     INFO main: Finished making request
     INFO Shut down tracer provider
    

Ultimately the solution likely lies in toktio-telemetry with the proper handling of otel contexts for on_enter and on_exit event hooks.

saschagrunert added a commit to saschagrunert/conmon-rs that referenced this issue Oct 17, 2022
According to
open-telemetry/opentelemetry-rust#888 (comment),
this works around the current the cases where the root span is missing
from the traces.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
saschagrunert added a commit to saschagrunert/conmon-rs that referenced this issue Oct 17, 2022
According to
open-telemetry/opentelemetry-rust#888 (comment),
this works around the issue where the root span is missing from the
traces.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
saschagrunert added a commit to saschagrunert/conmon-rs that referenced this issue Oct 17, 2022
According to
open-telemetry/opentelemetry-rust#888 (comment),
this works around the issue where the root span is missing from the
traces.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
@saschagrunert
Copy link
Contributor

@mkatychev I implemented something like this in conmon-rs (containers/conmon-rs#807), but the parent spans still seems missing from time to time.

Ref: tokio-rs/tracing-opentelemetry#58

@hdost hdost added this to the Tracing API And SDK Stable milestone Mar 2, 2023
@hdost hdost added the release:required-for-stable Must be resolved before GA release, or nice to have before GA. label Mar 2, 2023
@mkatychev
Copy link
Contributor

@saschagrunert hope you made some progress. I haven't gotten around to dealing with the issue. Hopefully this stalled PR might provide some context:
tokio-rs/tracing#2345

@AlexandreCassagne
Copy link

Hello all, can you provide a status update for this issue? Currently, the tracing integration is effectively broken from my understanding.

I am using the opentelemetry-zipkin exporter, so this issue is not only occurring with the Jaeger exporter.

@cijothomas
Copy link
Member

@AlexandreCassagne Unfortunately no. The tokio-tracing and otel integration requires some re-think/re-designs before they can fully interop! (#1533 (comment))

If you use OTel Tracing API throughout, and then something is broken, it is something we can fix in this repo. But the reality is - a lot of scenarios are already instrumented with tokio-tracing, and then interoping with OTel don't work nicely.

also related is #1378

@AlexandreCassagne
Copy link

@cijothomas That's disappointing, I integrated our tokio-tracing with opentelemetry specifically for exports to Jaeger/Zipkin. I hope that this will eventually be implemented better as distributed tracing is a very useful tool for our team.

Does this mean for those who wish to use opentelemetry, that we should integrate our tracing directly with it? or is there a middle way, such as a macro that can be used to add parent span contexts, etc? Can the OTel tracing api be used in parallel to provide links between spans?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trace Area: issues related to tracing bug Something isn't working release:required-for-stable Must be resolved before GA release, or nice to have before GA.
Projects
None yet
Development

No branches or pull requests

7 participants