Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last span is not dispalyed by tracing-telemetry #58

Closed
izderadicka opened this issue Oct 2, 2022 · 11 comments
Closed

Last span is not dispalyed by tracing-telemetry #58

izderadicka opened this issue Oct 2, 2022 · 11 comments
Assignees

Comments

@izderadicka
Copy link

Bug Report

Version

├── tracing v0.1.36
│   ├── tracing-attributes v0.1.22 (proc-macro)
│   └── tracing-core v0.1.29
├── tracing-log v0.1.3
│   └── tracing-core v0.1.29 (*)
├── tracing-opentelemetry v0.18.0
│   ├── tracing v0.1.36 (*)
│   ├── tracing-core v0.1.29 (*)
│   ├── tracing-log v0.1.3 (*)
│   └── tracing-subscriber v0.3.15
│       ├── tracing-core v0.1.29 (*)
│       ├── tracing-log v0.1.3 (*)
│       └── tracing-serde v0.1.3
│           └── tracing-core v0.1.29 (*)
└── tracing-subscriber v0.3.15 (*)

rust: rustc 1.64.0 (a55dd71d5 2022-09-19)

Platform

Linux ivan-ThinkPad-T470 5.4.0-126-generic tokio-rs/tracing#142-Ubuntu SMP Fri Aug 26 12:12:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Crates

tracing-opentelemetry ?

Description

Top - outer most span is not exported/printed by tracing-opentelemetry

use std::io::Write;

use opentelemetry::sdk::export::trace::stdout;
use tracing::{info, instrument, span, Level, Subscriber};
use tracing_subscriber::{layer::SubscriberExt, Registry, layer::Layer};

struct MyLayer;

impl <S:Subscriber> Layer<S> for MyLayer {
    fn on_enter(&self, id: &span::Id, _ctx: tracing_subscriber::layer::Context<'_, S>) {
        println!(">>> Entering span {:?}", id)
    }

    fn on_exit(&self, id: &span::Id, _ctx: tracing_subscriber::layer::Context<'_, S>) {
        println!("<<< Leaving span {:?}", id)
    }

}

mod old {
    use log::info;

    pub fn legacy(s: &str) {
        info!("I'm legacy logging {msg}", msg = s);
    }
}

#[instrument(name = "fn.answer")]
fn answer(n: i32) {
    info!("universal answer is {}", n);
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    tracing_log::LogTracer::init()?;
    let tracer = stdout::new_pipeline().install_simple();
    let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
    let log = tracing_subscriber::fmt::layer();

    let subscriber = Registry::default().with(telemetry)
    .with(log).with(MyLayer);

    tracing::subscriber::set_global_default(subscriber)?;
    let span = span!(Level::INFO, "root", "root span");
    span.in_scope(|| {
        let span = span!(Level::INFO, "inner", "inner span");
        span.in_scope(|| {
            println!("Hello, world!");
            info!(fun = "main", "First event");
            answer(42);
            old::legacy("abcd");
        });
    });
    opentelemetry::global::shutdown_tracer_provider();
    std::io::stdout().lock().flush()?;
    std::thread::sleep(std::time::Duration::from_millis(1000));
    Ok(())
}

top most "root" span is not printed.   I've played with it a bit,  this is last code, however same problem is where there are just 2 or 1 spans. It seem like some concurrency issue/ race condition, because there has been very few runs, when it worked as expected.

This is an output I get for code above

```text
$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/tele`
>>> Entering span Id(1)
>>> Entering span Id(2)
Hello, world!
2022-10-02T18:09:29.544798Z  INFO root{root span}:inner{inner span}: tele: First event fun="main"
>>> Entering span Id(3)
2022-10-02T18:09:29.545012Z  INFO root{root span}:inner{inner span}:fn.answer{n=42}: tele: universal answer is 42
<<< Leaving span Id(3)
2022-10-02T18:09:29.545274Z  INFO root{root span}:inner{inner span}: tele::old: I'm legacy logging abcd    
<<< Leaving span Id(2)
<<< Leaving span Id(1)
SpanData { span_context: SpanContext { trace_id: c236bdf5842702205bf7232984665c94, span_id: 8d94c0875733e58e, trace_flags: TraceFlags(1), is_remote: false, trace_state: TraceState(None) }, parent_span_id: d5dc478fd6cb6904, span_kind: Internal, name: "fn.answer", start_time: SystemTime { tv_sec: 1664734169, tv_nsec: 544920822 }, end_time: SystemTime { tv_sec: 1664734169, tv_nsec: 545106681 }, attributes: EvictedHashMap { map: {Static("code.namespace"): String(Static("tele")), Static("code.lineno"): I64(28), Static("thread.id"): I64(1), Static("thread.name"): String(Owned("main")), Static("n"): I64(42), Static("busy_ns"): I64(98848), Static("idle_ns"): I64(85685), Static("code.filepath"): String(Static("src/main.rs"))}, evict_list: [Static("idle_ns"), Static("busy_ns"), Static("n"), Static("thread.name"), Static("thread.id"), Static("code.lineno"), Static("code.namespace"), Static("code.filepath")], max_len: 128, dropped_count: 0 }, events: EvictedQueue { queue: Some([Event { name: "universal answer is 42", timestamp: SystemTime { tv_sec: 1664734169, tv_nsec: 544998386 }, attributes: [KeyValue { key: Static("level"), value: String(Static("INFO")) }, KeyValue { key: Static("target"), value: String(Static("tele")) }, KeyValue { key: Static("code.filepath"), value: String(Static("src/main.rs")) }, KeyValue { key: Static("code.namespace"), value: String(Static("tele")) }, KeyValue { key: Static("code.lineno"), value: I64(30) }], dropped_attributes_count: 0 }]), max_len: 128, dropped_count: 0 }, links: EvictedQueue { queue: None, max_len: 128, dropped_count: 0 }, status: Unset, resource: Resource { attrs: {Static("service.name"): String(Owned("unknown_service"))}, schema_url: None }, instrumentation_lib: InstrumentationLibrary { name: "opentelemetry", version: Some("0.18.0"), schema_url: None } }
SpanData { span_context: SpanContext { trace_id: c236bdf5842702205bf7232984665c94, span_id: d5dc478fd6cb6904, trace_flags: TraceFlags(1), is_remote: false, trace_state: TraceState(None) }, parent_span_id: 96c0ac6874126034, span_kind: Internal, name: "inner", start_time: SystemTime { tv_sec: 1664734169, tv_nsec: 544704724 }, end_time: SystemTime { tv_sec: 1664734169, tv_nsec: 545362417 }, attributes: EvictedHashMap { map: {Static("code.namespace"): String(Static("tele")), Static("busy_ns"): I64(584925), Static("code.filepath"): String(Static("src/main.rs")), Static("idle_ns"): I64(75826), Static("thread.name"): String(Owned("main")), Static("code.lineno"): I64(45), Static("thread.id"): I64(1), Static("message"): String(Owned("inner span"))}, evict_list: [Static("idle_ns"), Static("busy_ns"), Static("message"), Static("thread.name"), Static("thread.id"), Static("code.lineno"), Static("code.namespace"), Static("code.filepath")], max_len: 128, dropped_count: 0 }, events: EvictedQueue { queue: Some([Event { name: "First event", timestamp: SystemTime { tv_sec: 1664734169, tv_nsec: 544775422 }, attributes: [KeyValue { key: Static("level"), value: String(Static("INFO")) }, KeyValue { key: Static("target"), value: String(Static("tele")) }, KeyValue { key: Static("fun"), value: String(Owned("main")) }, KeyValue { key: Static("code.filepath"), value: String(Static("src/main.rs")) }, KeyValue { key: Static("code.namespace"), value: String(Static("tele")) }, KeyValue { key: Static("code.lineno"), value: I64(48) }], dropped_attributes_count: 0 }, Event { name: "I'm legacy logging abcd", timestamp: SystemTime { tv_sec: 1664734169, tv_nsec: 545253872 }, attributes: [KeyValue { key: Static("level"), value: String(Static("INFO")) }, KeyValue { key: Static("target"), value: String(Owned("tele::old")) }, KeyValue { key: Static("code.filepath"), value: String(Owned("src/main.rs")) }, KeyValue { key: Static("code.namespace"), value: String(Owned("tele::old")) }, KeyValue { key: Static("code.lineno"), value: I64(24) }], dropped_attributes_count: 0 }]), max_len: 128, dropped_count: 0 }, links: EvictedQueue { queue: None, max_len: 128, dropped_count: 0 }, status: Unset, resource: Resource { attrs: {Static("service.name"): String(Owned("unknown_service"))}, schema_url: None }, instrumentation_lib: InstrumentationLibrary { name: "opentelemetry", version: Some("0.18.0"), schema_url: None } }

As you can there are only 2 SpanData top most "root" is missing.
In cases when I had only one span, nothing was usually printed, but sometimes it did work, I though it may be because main thread finished before last span was output, so I tried to add code to end of main to keep main thread live for a while, but it does not seem the case.

@saschagrunert
Copy link

saschagrunert commented Oct 10, 2022

I agree this seems to be an issue. For example for every root span in the Jaeger UI it displays the warning "invalid parent span IDs=c877373268c0dfe2; skipping clock skew adjustment".

@saschagrunert
Copy link

I think this may be related to open-telemetry/opentelemetry-rust#888

@hawkw
Copy link
Member

hawkw commented Oct 10, 2022

Can this be reproduced using the opentelemetry crate without using tracing-opentelemetry? It would be good to determine whether this is a tracing-opentelemetry or opentelemetry issue.

@saschagrunert
Copy link

Can this be reproduced using the opentelemetry crate without using tracing-opentelemetry? It would be good to determine whether this is a tracing-opentelemetry or opentelemetry issue.

Maybe by modifying the reproducer repo there: https://github.com/shivjm/rust-opentelemetry-jaeger-test

@saschagrunert
Copy link

I was not able to reproduce it without tracing-opentelemetry yet.

@TommyCpp
Copy link

I think opentelemetry#888 has something to do with tracing integration within HTTP clients. If we use an HTTP client without tracing integration the spans seem to be fine. I plan to take another look over the weekend.

@saschagrunert
Copy link

@TommyCpp did you get any more insights on that?

@jaysonsantos
Copy link
Contributor

Hi there @izderadicka would you mind checking with this code?

use opentelemetry::sdk::trace::TracerProvider;
use opentelemetry::trace::TracerProvider as _;
use tracing::{info, instrument, Level, span, Subscriber};
use tracing_subscriber::{layer::Layer, layer::SubscriberExt, Registry};

struct MyLayer;

impl <S:Subscriber> Layer<S> for MyLayer {
    fn on_enter(&self, id: &span::Id, _ctx: tracing_subscriber::layer::Context<'_, S>) {
        println!(">>> Entering span {:?}", id)
    }

    fn on_exit(&self, id: &span::Id, _ctx: tracing_subscriber::layer::Context<'_, S>) {
        println!("<<< Leaving span {:?}", id)
    }

}

mod old {
    use log::info;

    pub fn legacy(s: &str) {
        info!("I'm legacy logging {msg}", msg = s);
    }
}

#[instrument(name = "fn.answer")]
fn answer(n: i32) {
    info!("universal answer is {}", n);
}

struct OtelGuard;
impl Drop for OtelGuard {
    fn drop(&mut self) {
        opentelemetry::global::shutdown_tracer_provider();
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let _guard = OtelGuard;

    tracing_log::LogTracer::init()?;
    let provider = TracerProvider::builder()
        .with_simple_exporter(opentelemetry_stdout::SpanExporter::default())
        .build();
    let tracer = provider.tracer("readme_example");
    let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
    let log = tracing_subscriber::fmt::layer();

    let subscriber = Registry::default().with(telemetry)
        .with(log).with(MyLayer);

    tracing::subscriber::set_global_default(subscriber)?;
    let span = span!(Level::INFO, "root", "root span");
    span.in_scope(|| {
        let span = span!(Level::INFO, "inner", "inner span");
        span.in_scope(|| {
            println!("Hello, world!");
            info!(fun = "main", "First event");
            answer(42);
            old::legacy("abcd");
        });
    });
    Ok(())
}

Cargo.toml

[package]
name = "delete"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
log = "0.4.20"
opentelemetry = "0.20.0"
opentelemetry-stdout = { version = "0.1.0", features = ["trace"] }
tracing = { version = "0.1.37", features = ["log"] }
tracing-log = "0.1.3"
tracing-opentelemetry = "0.21.0"
tracing-subscriber = "0.3.17"

the main difference is here:


struct OtelGuard;
impl Drop for OtelGuard {
    fn drop(&mut self) {
        opentelemetry::global::shutdown_tracer_provider();
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let _guard = OtelGuard;
...
}

because potentially the global shutdown could run before the root span is dropped, which i assume it would be "stuck" somehow, the drop guard ensures that it will be executed after the other variables inside main's body get dropped

@izderadicka
Copy link
Author

izderadicka commented Sep 2, 2023

@jaysonsantos Just using new versions of crate solved the issue , neither guard, nor final call to shutdown_tracer_provide.

So works almost as expected, however there is still small issue - an out put from my custom layer MyLayer is mixed with output from opentelemetry_stdout::SpanExporter as shown below (mixed in text is in bold):

{"resourceSpans":[{"resource":{"<<< Leaving span Id(2)
attributes":[{"key":"service.name","value":{"stringValue":"unknown_service"}}]},"scopeSpans":[{"scope":{"name":"example"},"spans":[{"traceId":"169b8caa4609a00fe619936aa309e7d1","spanId":"8ec3d66a7460cd26","parentSpanId":"ce9dd2eeda464067","name":"fn.answer","kind":1,"startTimeUnixNano":1693643878095316553,"endTimeUnixNano":1693643878095388815,"attributes":[{"key":"thread.id","value":{"intValue":1}},{"key":"code.lineno","value":{"intValue":28}},{"key":"busy_ns","value":{"intValue":41085}},{"key":"code.filepath","value":{"stringValue":"src/main.rs"}},{"key":"code.namespace","value":{"stringValue":"tele"}},{"key":"thread.name","value":{"stringValue":"main"}},{"key":"n","value":{"intValue":42}},{"key":"idle_ns","value":{"intValue":25995}}],"droppedAttributesCount":0,"events":[{"name":"universal answer is 42","attributes":<<< Leaving span Id(1)
[{"key":"level","value":{"stringValue":"INFO"}},{"key":"target","value":{"stringValue":"tele"}},{"key":"code.filepath","value":{"stringValue":"src/main.rs"}},{"key":"code.namespace","value":{"stringValue":"tele"}},{"key":"code.lineno","value":{"intValue":30}}],"droppedAttributesCount":0}],"droppedEventsCount":0,"droppedLinksCount":0,"status":{}}]}]}]}

If I remember correctly println! locks stdout, so the SpamExporter must write to raw stdout, without synchronization. Not sure if this is an intent, but in scenarios like these (probably mostly used for debugging) I can cause issues as noted above.

@hawkw hawkw transferred this issue from tokio-rs/tracing Sep 2, 2023
@izderadicka
Copy link
Author

@jtescher
Quickly looked into code - problem which outputs are intermixed goes do to serde-json as it writes to stdout by tokens, thus stdout lock is released in meanwhile, thus other Layers can output in meanwhile - into middle of span printout.

For me this helped - an encoder that first encodes whole structure to buffer:

fn bulk_encoder(out: &mut dyn Write, data: SpanData) -> TraceResult<()> {
    let mut out_buffer: Vec<u8> = Vec::new();
    serde_json::to_writer(&mut out_buffer, &data)
        .map_err(|err| TraceError::Other(Box::new(err)))?;
    out.write_all(&out_buffer)
        .map_err(|err| TraceError::Other(Box::new(err)))
}

and use it like this:

let provider = TracerProvider::builder()
        .with_simple_exporter(
            opentelemetry_stdout::SpanExporterBuilder::default()
                .with_encoder(bulk_encoder)
                .build(),
        )
        .build();

From my part issue can be closed - or I can provide PR based on the above code.

@jtescher
Copy link
Collaborator

jtescher commented Sep 5, 2023

Ok let's close for now, if you want to open an issue about it on http://github.com/open-telemetry/opentelemetry-rust/ we can look at the current exporter behavior.

@jtescher jtescher closed this as completed Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants