-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export Jaeger spans on behalf of services / service endpoints #547
Comments
I assigned this to myself to investigate it, and collect our previous findings on the topic. |
One answer to the problem could be that the Jaeger visualization is not the right tool for the job because it is intended for tracing physical activity whereas we are more interested in visualizing the logical activity. Maybe we could take the Jaeger data (traces and spans) and render it differently with our own visualization (e.g. the start and end span of an invocation could denote the length of a logical invocation). |
First and foremost, it is important to note that we already export all the available information we have in the runtime for the invocations, recording enough details to filter them out and to build an alternative presentation layer. From this point of view, no action needs to be taken IMO. If I understand correctly the issue, to act on behalf of the service, we would need to set the
This means that we would need somehow to implement/fork a tracing exporter that allows us to change "in-flight" the resource.name field, so when we export we can "behave like a service endpoint". In theory, this is possible, as we just need to export a different batch of spans per service endpoint. In practice, there are a number of challenges to implement an exporter, including understanding the internals of tokio's
Because this solution would be something "non standard", there might even be the (very small) risk that in future our exporter won't be compatible anymore with new jaeger versions. On top of that, I second the sentiment expressed by @tillrohrmann about generating "data that don't match the reality". If we emit spans on behalf of the endpoint, timestamps and events won't match, it won't be possible to know which pod actually processed the invocation, and it won't be possible to integrate tracing with external systems though W3C trace context (see restatedev/sdk-java#76). The last point is in particular very important for our story of integrating with already existing systems, if we stay in the "tracing standards" it would be a piece of cake for a user to see a full trace, from the web ui to a restate service to some old internal microservice. I'm personally in favor of working towards a custom presentation layer. If we plan to have a Restate GUI/CLI anyway in the long term, it is probably a better time investment to build a visualization tool part of this Restate GUI/CLI that takes data from Jaeger and visualizes as we want. This visualization might even just be a fork of the Jaeger UI, and not necessarily something we start from scratch. |
Can we just have multiple tracingproviders, each set up for a given resource.name? Especially if we are using UDP delivery, this will not have too much cost, right? EDIT: seems like in Java land there are people doing this open-telemetry/opentelemetry-java#1109 and you can still share exporters so there is only 1 conn to the backend despite multiple resouces even if we are uncomfortable with the memory implications here, I think we need to assess whether this is actually a difficult fork to support 'inflight' changes as you say. I don't think we should presume that its going to be hard to keep having a flexible tracing library despite this change. And I also don't think we should introduce complexity now because we don't know if we might want to support other trace backends in future - we need a simple tracing solution that fits users intuitions, and this might be a way to get one very rapidly |
I'm still not sure I understand this. We can just report the runtimes timestamps - we report when we became aware of the service making a call, reading some state, sleeping, etc. The trace will match the journal. What reality does it not match? Only things the service did that weren't journaled - which is ok, right? |
re presentation layer - do we agree that it is possible for jaeger to have the data so that it shows exactly what we want? So isnt the question simply, how do we get that data to jaeger. Perhaps transformation steps are needed before it gets there - whether in our code, or somewhere else. But I am not sure how this is a UI problem, when Jaeger UI fundamentally shows whatever the data says |
looking into the opentelemetry library, the Resource data is just a COW reference that gets read out the TracerProvider config then put into every SpanData. I don't see any reason why every SpanData couldn't have a different Resource - or why the batches need to be grouped by Resource. So, I think its a very minor change to allow the Span itself to override some Resource fields. I'll test this out - maybe that's all we need |
One additional aspect I didn't included in my initial comment which makes me lean toward the "custom presentation layer" is the "end span problem". As shown here #211 we have for each invocation a
The reason for which we're doing the latter is because we want to show users the invocation progress. In particular for long workflows, it is important to show the progress of the invocation before it ends. |
With #552 we will switch from using UDP to gRPC for reporting OTLP trace data.
Assume for example that service A invokes service B and the runtime sees the invoke journal entry. If it creates a span for invoking service B when it sends the invocation (from within the invoker) then it records how it looks from the runtime perspective (e.g. when the call went out). The runtime does not necessarily know the exact machine on which the call will run nor when it will exactly start. If the service endpoint would create the span instead, then one could see how long it took from sending the message from the runtime to actually executing code. Note that we are already creating spans for all kinds of journal actions/entries as they arrive at the runtime (just under the runtime process identity and not the respective service identity). I guess one question would be how important tracing the exact physical activity is vs. how sufficient is it to trace what the runtime thinks currently happens. Do people use trace information to diagnose infrastructure problems like congestion or so by looking at the queuing times, for example? Do people want to have information on which pod exactly a service was executed from their traces? One can of course also opt-in to a more accurate tracing if one enables it on the SDK (assuming that the SDK has support for it at some point). |
I personally think that the superpower here is that you can get a really good view into what your application is doing by only looking at Restate. If you want more than that - add tracing to your application! Personally I use tracing to get an idea of the flow of what happened, not to get super accurate durations. I think the opt in thing is critical here - some stuff eg what pod it ran on we can never provide without them setting up tracing on their endpoint. But, having a really amazing observability story with no such setup is a huge win for us! |
In any case, the exporter and the traceprovider are separate; there will still be one grpc client, even if we had many trace providers But a fork might be simpler :) |
That's nice of OTLP :-)
I think the journal actions (get, set, clear state, invoke another service, etc.) are logged as events that get associated to their respective span. You should see them in the logs section if I am not mistaken (assuming that you enabled DEBUG logging for tracing (default is info, I believe)).
That would indeed be nice. At the same time it would be great if this information wouldn't be logged on info level to the log file because a per invocation logging can become quite verbose. Since logging and tracing use both the same infrastructure, we currently log the detailed invocation actions (get, set, invoke, etc.) as debug so that they don't appear by default in the logs. |
Yeah, im trying to find them but not succeeding at the moment. I think it would be cool if we could do spans for these things
How do we want suspending to work here? I would have expected that an invocation ends on suspension, so we will get an end span pretty regularly and we don't have to worry about in progress spans. Would that solve the end invocation problem? Then on replay, you see a new span showing just the new journal entries, showing that the service 'picked up where it left off' |
You need to set the jaeger filter to debug, I believe. By default it is set to |
I think we still need a parent span that shows the whole duration of an invocation. Suspension is a Restate detail. From the service's perspective it is still time that the service needs to complete its invocation. |
fair enough! i feel that we are once again talking about the edge case of very long running invocations... i think users intuition will be that tracing shows only completed requests, and 95% of the time this will be within a few minutes of the request invoking. I understand there might be a demand for showing in progress invocations, but this feels like a different observability tool really - lets remember that in production, you normally have a trace sampling prob of 0.01 or lower (but often higher for failed requests) - so you really can't bank on any given request being traced unless it failed (in which case, it has completed). you often use tracing to get an idea of the categories of requests that are being made, and how they flow - so its fine to only see completed requests, imo |
I am not arguing in favor of the end span here. I think it can also be fine to only see the spans once the invocation is done. My point is that it would be nice if there is a single span or two markers that describe the duration of an invocation. If we want then one can still show periods when it ran on the service endpoint until it hit a suspension as a child span. |
Yes, I understand - i agree btw - it looks like temporal is hitting into similar complexities with jaeger spans as us temporalio/sdk-typescript#278 |
turns out its possible without forking otel libs :). we can wrap the otel processor with our own which modifies spans right before they get sent. I think that alone is sufficient |
Nice, that be really great :-) |
It could be a nice UX improvement to let the runtime export Jaeger spans on behalf of the service endpoints because then he wouldn't have to configure his service endpoints to do so. Moreover, the SDK wouldn't have to support exporting OTEL spans.
It is, however, currently not clear whether this is easily possible with the existing tracing and OTEL libraries.
Another aspect to consider is whether the emitted spans are correct if the runtime emits them on behalf of the service endpoint w/o exactly knowing what is happening on the service endpoint (e.g. the starting timestamps of active services would appear earlier then what they really are because the transmission and queuing latency is ignored).
The text was updated successfully, but these errors were encountered: