-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kotlin coroutines instrumentation #4405
Kotlin coroutines instrumentation #4405
Conversation
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
35df53a
to
b06ac78
Compare
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
b06ac78
to
a3512c8
Compare
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
…utines-instrumentation
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
5088230
to
8530f70
Compare
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
8530f70
to
079ab2c
Compare
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
Thanks for raising this 💯 |
…utines-instrumentation
Hey @mcculls @bantonsson , sorry for pinging, but would you mind having a look here? I believe having a proper instrumentation for coroutines will make lots of DD customers (including myself) happy. In case if you think the changes here are a bit too intrusive, there's also an option to do it like this: c7b4468 , with no changes to the API. |
…utines-instrumentation
Thanks @monosoul. I'll try to review this tomorrow. |
…utines-instrumentation Signed-off-by: monosoul <Kloz.Klaud@gmail.com> # Conflicts: # dd-trace-core/src/main/java/datadog/trace/core/scopemanager/ContinuableScopeManager.java
@monosoul I need to dig a bit deeper into this. The |
@bantonsson yeah, the change here basically makes sure that each coroutine has it's own scope that doesn't interfere with the thread's scope and other coroutines. I.e. while the thread runs the coroutine - the scope manager will use the scope stored in the |
Hey @monosoul. I've been reading up on this a bit. This whole solution would need to use continuations and activate/deactivate scopes in a different way to handle separating the creation of coroutines and the running of them. I have a changed test that splits the reported spans into two traces which is not correct and I'll open a PR against your branch with it. Would love to have a bigger discussion about how one would expect the UI to show these coroutines, since I think there is where we need to start to get this right. |
@bantonsson Why would we need a change in the UI? The main problem is that traces getting split with new trace_ids because scope/span are not propagated correctly with coroutines. If we do, the flow would look exactly same as it does now (in a normal/other app). |
Hey @bantonsson , thanks for looking into it!
I'm not sure if I got that part, but I guess it will be easier to discuss when you open the PR. Here are my expectations from coroutines instrumentation:
As for how it looks in the UI - I'd also expect the span hierarchy to be respected there. If I got it right - you want to use continuations so that the spans will have breaks in them when coroutine gets suspended, right? |
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
Signed-off-by: monosoul <Kloz.Klaud@gmail.com>
Happy Holidays @monosoul. I just wanted to say that I've been looking over this code and working on some more tests as well as digging into the Kotlin Coroutine machinery. I'll get back with an update next week. |
@monosoul I've opened a new PR against your branch monosoul#2 that has all my changes, both tests and implementation. I've also tested it against CI here #4499 |
@bantonsson thanks! I'll check it tomorrow |
@bantonsson I've left a few comments there. Tbh, now I think it's probably impossible/hard to achieve the behavior for lazily started coroutines I mentioned here. First of all because there are no guarantees a lazily started coroutine will ever be started or cancelled after it was instantiated. |
Hello folks. Is there any update on this PR? Can we keep them out of scope and get the base fix merged? We can mark lazy coroutines as a known bug, but atleast we would be able to solve a lot of other ones (everything basically) |
@vsingal-p I've reviewed the fixes that @monosoul have done in his repo, and I think that with those fixes done, we could move this PR forward if we have the instrumentation as an opt-in until it's been tested and we know what the performance implications are. |
Sounds like a plan. @monosoul Let's get all the changes in and get this fix launched 🤞 And huge thanks for all your efforts 🙇 |
Sorry for the delay on my end this time, been a bit busy lately. I'll try to implement the changes suggested by @bantonsson by the end of this week. |
@bantonsson I've updated the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @monosoul for a great contribution.
@monosoul Would you mind rewording your PR description to what the state of the instrumentation is now, and mention the flags for turning on the instrumentation? Also, could you switch out the use of |
@bantonsson Done! |
is it working correctly? |
@kwazii1231 you need to also enable the feature, since it's behind a flag atm |
really thanks to reply very quickly FROM adoptopenjdk:11.0.11_9-jdk-hotspot as app ARG APP_PROFILE=local WORKDIR /app WORKDIR /app ENTRYPOINT java is there any other things to do? |
@kwazii1231 you need to add the agent to your jvm cmd line: See this documentation: https://docs.datadoghq.com/tracing/trace_collection/dd_libraries/java/?tab=springboot |
also did at common template of my helm chart like this
only async statement for grpc api call trace is not correctly collected |
Hi @monosoul |
What Does This Do
Provides a proper instrumentation for Kotlin coroutines. Solves #931 , Solves #1123 .
Introduces:
ScopeState
(probably might have a better name) <-- an abstraction for holding and managing current scope state.ContinuableManagedScope
<-- implementation that holds aScopeStack
. Onactivate
it sets the scope stack it holds into the thread local scope stack ofContinuableScopeManager
. OnfetchFromActive
it fetches the value of thread local scope stack into it's local variable.ScopeStateAware
<-- provides a method to get a new scope state. Extended byAgentScopeManager
andTracerAPI
ScopeStateCoroutineContext
<-- coroutine context element that allows to store/restore scope state on suspension/continuation.AbstractCoroutineInstrumentation
<-- instrumentation forAbstractCoroutine
, adds 3 advices:AbstractCoroutineConstructorAdvice
<-- advice aroundAbstractCoroutine
constructor that creates a new instance ofScopeStateCoroutineContext
and puts it into theCoroutineContext
ensuring each coroutine has it's own instance of the context element and it is never inherited. If the coroutine is not lazily started, then also captures the current active scope.AbstractCoroutineOnStartAdvice
<-- advice aroundAbstractCoroutine#onStart
to capture active scope on if/when a coroutine starts. Will do nothing for eagerly started coroutines (scope already got captured there on construction), only applicable to lazily started coroutines.JobSupportAfterCompletionInternalAdvice
<-- advice aroundAbstractCoroutine#onCompletionInternal
to guarantee the scope captured by coroutine is closed.CoroutineContextHelper
<-- a set of helper functions to access elements ofCoroutineContext
.Basically, what it does - is makes each coroutine have it's own scope stack.
Motivation
At the moment DataDog agent doesn't work properly with Kotlin coroutines, especially in the case when a coroutine gets suspended first and then gets continued. This is because
ContinuableScopeManager
stores the scope stack in a thread local variable.Imagine this situation:
#1
and then it gets suspended.#1
is busy with another coroutine, so our coroutine gets scheduled to the thread#2
ContinuableScopeManager
uses a thread local scope stack, all the scopes that we have activated in our coroutine got left on the thread#1
. Moreover, the thread#2
might also have some leftovers in the scope stack from another coroutine, leading to missing spans and broken span hierarchy.Having this instrumentation will make DD customers using Kotlin coroutines way happier with the product.
Additional Notes
The test case to demonstrate the issue can be found in this commit: df6d722 (for easier review).
You can find more information about coroutines in Kotlin here:
Feature flags
The feature is disabled by default.
dd.integration.kotlin_coroutine.experimental.enabled
DD_INTEGRATION_KOTLIN_COROUTINE_EXPERIMENTAL_ENABLED
Either of the 2 should be set to
true
to enable the feature.