-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to determine the parent context of the second of two logically sequential requests #892
Comments
copying some folks: @openzipkin/core @openzipkin/armeria @yschimke @bsideup @marcingrzejszczak so this topic "async callback tracing" has been debated a lot which is why I asked to raise this as an issue.. firstly to this repo as we have a test case we can proceed with. The question is that when you are an RPC async response or error callback, what should the parent be? I think when I asked the same in javascript once, @eirslett mentioned that result callbacks on a client should infer the parent from the caller of the client. This would cause less confusion with any child spans that are incidental to the network request (such as DNS). It also reads more like sequential code. For example, let's say you have an RPC by itself, and it makes a DNS or related activity. This is clearly a causally linked child. A follow-up RPC request, such as a http redirect, is interestingly not the child of the first RPC request, as it would confuse this and mess up aggregations such as counts (as clients are not expected to be children of clients). For this reason, when we have a chance of a retry we create an "application span" which is the parent of the network requests. It would follow to me that business code should not use the client call they have a data dependency on as the parent context. Not only could that result in a tree depth of hundreds or thousands, and confuse the structure of the trace, but it would create different tree pattern just on accident of using async style. Let's take a look at sequential vs async wrt data dependencies. ex the following async call:
is the async variant of the following top-to-bottom code
I'd like to use this as evidence that the response/error callback should not need to look to see if a value is read in order to decide what the parent is. A data dependency is hard to track down anyway, we try to do this in messaging (as the message itself is the context), but in RPC, we don't and if we started to now, I do think our traces would end up very deep when people are making a lot of RPC calls and this would be unintuitive and break analysis. fight me :D |
Chiming in with a small idea, this seems like the correct use case for the OT follows-from relationship type. That way parent-child is maintained, but the relationship is also captured, in that you can account for why the second span is paused |
@devinsba let's say we made the parent the ID of the client on the response chain, adding a tag about the relationship, wouldn't you still have an ancesteral issue regarding RPC analysis? A client would be calling a client possibly 100 in a straight line (staircase of a very deep hierarchy trace).. even if we made some changes here, I think it would break amazon X-Ray and others. Those who had the whole trace could correct it back to the real parent, but those who didn't or didn't update to that server-side code will look like broken instrumentation. If instead, we were thinking about the follows-from being a separate span ID (like tag="http.response_span=xxxx" in the callback), that's something we could add in instrumentation (leaving the real parent alone). This probably means co-propagating the real parent, vs the response callback depending on how we do this. we could try to be clever about it. However we think about this, we should be careful to not misinterpret the data or use some OT concept.. it is often discussed before practiced. I'm not sure which backends (certainly not any clouds) support this. I'm pretty sure "follows-from" was about layout concerns with messaging processing and discussed back when appdash was a thing. Here's an issue about that here openzipkin/zipkin#1701 cc @tacigar I do think the message processing case with or without a follows-from ref could be done more elegantly. Back to RPC I think the closest match to the problem space is the async sequence ID which is something pinpoint uses to clarify scheduling order. Again, that's a separate thing. One reason I raised this issue in a concrete place, like instrumentation, was to show not tell. We have a great chance to avoid mistakes some "standards" things make, but showing exactly what happens when modeling is changed without a way to revert it. Remember, we have access to the in memory storage with has a dependency linker. It shouldn't be too hard to show exactly the concerns I've mentioned which gives folks a chance to mitigate them in worst case a FinishedSpanHandler, which rewrites the IDs back :P To get towards concrete.. to show not tell could be to make a test case, which insulates us from the breaking, but allow folks to play what'if games. If we couple what ifs with a trace image and some tests that still pass on dependency linking etc or a change list to fix them, we could move carefully if this way is forward. |
I actually meant exactly what you said, I think it should be a tag that potentially we assign special meaning to in the UI, but leaves the data model alone. So in the example given:
The spans for clients call1 and call2 share the same parent, but the span for call2 has a tag |
@devinsba ack. I still would hesitate using "followsFrom" word, for this unless it is well defined and also well used in practice.. we could figure that out by looking at all the source code made by OT servers (there are few of them, might be as easy as looking at jaeger ui as I know no other that would have likely used this besides the defunct appdash). Remember people use brave to send to jaeger, and we don't want to create junk there.. I doubt they register multiple parents in any way. The linked span thing in OC may also have completely different semantics.. avoiding naming this tag something used for other things could be a good strategy as it is easy enough to rename a tag to something else. To make a concrete experiment, I think we'd need something more tough than 2 calls as the edge cases come out in parallel or nested N calls hung off the response thread. I suspect to get the immediate parent ID, we may need some trickery as our "current span" will still the real parent and we have no idea if a callback will ever have another client span invoked. It might smell like co-propagation, just in case a client wants this data. This is the instrumentation tricky part.. maybe a special "extra" could be made to stuff the "last client" for this reason.. wdyt? |
I don’t have much skin in this game but strongly agree on using with using FollowsFrom from reading the spec. Without the spec, whatever you do is just another divergent practice. Following it you can achieve what you want in practice but when implementations don’t align at least you can raise a bug. Sent with GitHawk |
who's spec? who decided what? or it doesn't matter as long as it is called
a spec.
…On Fri, Apr 12, 2019 at 2:08 PM Yuri Schimke ***@***.***> wrote:
I don’t have much skin in this game but strongly agree on using with using
FollowsFrom from reading the spec.
Without the spec, whatever you do is just another divergent practice.
Following it you can achieve what you want in practice but when
implementations don’t align at least you can raise a bug.
Sent with GitHawk <http://githawk.com>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#892 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD61801QMHDquXjZJVCKH5Xm6rvYmkLks5vgBTHgaJpZM4chAyu>
.
|
here are some examples of follows-from that aren't test cases (there aren't many!) instana's tracer throwing it away
jaeger.. only a month ago handling it.. and treating exactly the same as child of
The point is that we can't blindly look at something mentioned as spec as actually having practice or being meaningfully evaluated. |
Google Search and the first spec that comes up :) . Please don't take this as any strong stance on my part. I got asked a question, with a suggested solution, I googled the solution and it appears to be defined in a relevant spec. https://github.com/opentracing/specification/blob/master/specification.md
|
> Google Search and the first spec that comes up :) . Please don't take this as any strong stance on my part. I got asked a question, with a suggested solution, I googled the solution and it appears to be defined in a relevant spec.
yep OT creates a lot of confusion by defining things they don't use
|
On reading it, it also made sense Sent with GitHawk |
anyone want to run a test with the OT approach then? I think there are folks who want to. |
Actually based on the fact they have actually defined the meaning of that reference type, it was a TODO last I saw, this is not a case that matches that semantic anyway. As the parent does depend on the result. This would be a different reference type, one where a spans duration depends on sibling. (The normal sync model being the timestamp depending on a sibling) |
Jumping late into this discussion but here are my 3p:
Also cc @ghermeto as he might have some valuable input about this. |
When you have two promises called one after the other, just means they are sequential, not that one is the parent of the other. You can visualize the parent. So, this: function parent() {
request1().then(result => request2() );
} can be written as: async function parent() {
await request1();
request2();
} which shows that just because they are sequential doesn’t imply hierarchy… I agree with Adrian statement that client calls are not expected to be children of client. IMO
|
This is an issue for discussing the proposal mentioned in gitter to consider using the context of the request(s) logically preceding a request as its parent context.
Given the following pseudo-code, it is – intuitively – obvious that the parent span of both
call1
andcall2
should be the same as there is no dependency.Now imagine the following pseudo-code where the second call (
call2
) depends on the outcome ofcall1
. Most clients/instrumentations will still use the same parent span for both calls. This makes observing behaviour correctly hard because it requires insight into the code to see what was actually written.As one of the use-cases of using tracing is to determine why and how a certain response came to be, my proposal is to use to the context of
call1
as the parent ofcall2
whencall2
depends on the response ofcall1
.As I know of no instrumentations that actually do this, I'm wondering why. Is it incredibly stupid to do this, or just hard?
We can extend the discussion on what the parent contexts should be for the following case:
In this case I would argue that
call3
should have both the contexts ofcall1
andcall2
as parent. Multiple parents is something we know from events already.The text was updated successfully, but these errors were encountered: