Kotlin coroutines instrumentation #4405

monosoul · 2022-12-08T21:14:28Z

What Does This Do

Provides a proper instrumentation for Kotlin coroutines. Solves #931 , Solves #1123 .

Introduces:

ScopeState (probably might have a better name) <-- an abstraction for holding and managing current scope state.
ContinuableManagedScope <-- implementation that holds a ScopeStack. On activate it sets the scope stack it holds into the thread local scope stack of ContinuableScopeManager. On fetchFromActive it fetches the value of thread local scope stack into it's local variable.
ScopeStateAware <-- provides a method to get a new scope state. Extended by AgentScopeManager and TracerAPI
ScopeStateCoroutineContext <-- coroutine context element that allows to store/restore scope state on suspension/continuation.
AbstractCoroutineInstrumentation <-- instrumentation for AbstractCoroutine, adds 3 advices:
- AbstractCoroutineConstructorAdvice <-- advice around AbstractCoroutine constructor that creates a new instance of ScopeStateCoroutineContext and puts it into the CoroutineContext ensuring each coroutine has it's own instance of the context element and it is never inherited. If the coroutine is not lazily started, then also captures the current active scope.
- AbstractCoroutineOnStartAdvice <-- advice around AbstractCoroutine#onStart to capture active scope on if/when a coroutine starts. Will do nothing for eagerly started coroutines (scope already got captured there on construction), only applicable to lazily started coroutines.
- JobSupportAfterCompletionInternalAdvice <-- advice around AbstractCoroutine#onCompletionInternal to guarantee the scope captured by coroutine is closed.
CoroutineContextHelper <-- a set of helper functions to access elements of CoroutineContext.

Basically, what it does - is makes each coroutine have it's own scope stack.

Motivation

At the moment DataDog agent doesn't work properly with Kotlin coroutines, especially in the case when a coroutine gets suspended first and then gets continued. This is because ContinuableScopeManager stores the scope stack in a thread local variable.
Imagine this situation:

We have a coroutine dispatcher that is an executor pool of 2 threads.
We start a coroutine on the thread #1 and then it gets suspended.
When it gets continued, the thread #1 is busy with another coroutine, so our coroutine gets scheduled to the thread #2
Because ContinuableScopeManager uses a thread local scope stack, all the scopes that we have activated in our coroutine got left on the thread #1. Moreover, the thread #2 might also have some leftovers in the scope stack from another coroutine, leading to missing spans and broken span hierarchy.

Having this instrumentation will make DD customers using Kotlin coroutines way happier with the product.

Additional Notes

The test case to demonstrate the issue can be found in this commit: df6d722 (for easier review).

You can find more information about coroutines in Kotlin here:

Feature flags

The feature is disabled by default.

System property dd.integration.kotlin_coroutine.experimental.enabled
Environment variable DD_INTEGRATION_KOTLIN_COROUTINE_EXPERIMENTAL_ENABLED

Either of the 2 should be set to true to enable the feature.

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

…utines-instrumentation

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

vsingal-p · 2022-12-10T22:02:48Z

Thanks for raising this 💯

…utines-instrumentation

monosoul · 2022-12-13T08:47:04Z

Hey @mcculls @bantonsson , sorry for pinging, but would you mind having a look here? I believe having a proper instrumentation for coroutines will make lots of DD customers (including myself) happy. In case if you think the changes here are a bit too intrusive, there's also an option to do it like this: c7b4468 , with no changes to the API.
Thanks!

…utines-instrumentation

bantonsson · 2022-12-14T16:19:16Z

Thanks @monosoul. I'll try to review this tomorrow.

…utines-instrumentation Signed-off-by: monosoul <Kloz.Klaud@gmail.com> # Conflicts: # dd-trace-core/src/main/java/datadog/trace/core/scopemanager/ContinuableScopeManager.java

bantonsson · 2022-12-15T14:56:36Z

@monosoul I need to dig a bit deeper into this. The Scope instances are not really intended to be shared across threads and I want to make sure that I understand the underlying mechanisms in Kotlin.

monosoul · 2022-12-15T15:33:24Z

@bantonsson yeah, the change here basically makes sure that each coroutine has it's own scope that doesn't interfere with the thread's scope and other coroutines. I.e. while the thread runs the coroutine - the scope manager will use the scope stored in the ScopeStateCoroutineContext instance unique for each coroutine. Once the coroutine is suspended or finished - the original thread's scope will be restored. Without this change multiple coroutines might share the same scope which leads to lost spans and span inheritance issues. You can think of ThreadContextElements as of ThreadLocals of coroutine world. Not sure if that explanation makes it any more clear 😅

bantonsson · 2022-12-16T14:01:34Z

Hey @monosoul. I've been reading up on this a bit. This whole solution would need to use continuations and activate/deactivate scopes in a different way to handle separating the creation of coroutines and the running of them.

I have a changed test that splits the reported spans into two traces which is not correct and I'll open a PR against your branch with it.

Would love to have a bigger discussion about how one would expect the UI to show these coroutines, since I think there is where we need to start to get this right.

vsingal-p · 2022-12-16T14:48:11Z

@bantonsson Why would we need a change in the UI? The main problem is that traces getting split with new trace_ids because scope/span are not propagated correctly with coroutines. If we do, the flow would look exactly same as it does now (in a normal/other app).

monosoul · 2022-12-16T15:00:16Z

Hey @bantonsson , thanks for looking into it!

I have a changed test that splits the reported spans into two traces which is not correct and I'll open a PR against your branch with it.

I'm not sure if I got that part, but I guess it will be easier to discuss when you open the PR.

Here are my expectations from coroutines instrumentation:

All calls of instrumented methods from within coroutines should work the same way as they work outside of coroutines. I.e. if I start a span (let's call it span A) before calling a coroutine and then inside the coroutine I call a method that was instrumented automatically, for example fetch messages with Kafka client, then each span created with Kafka client call should have span A as it's parent regardless of the thread the coroutine got scheduled to. If there are multiple parallel calls, they all should have span A as it's parent.
The instrumentation should just work without any changes to coroutines invocations.
And of course running coroutines should not mess up other spans.

As for how it looks in the UI - I'd also expect the span hierarchy to be respected there. If I got it right - you want to use continuations so that the spans will have breaks in them when coroutine gets suspended, right?

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

bantonsson · 2022-12-29T15:49:44Z

Happy Holidays @monosoul. I just wanted to say that I've been looking over this code and working on some more tests as well as digging into the Kotlin Coroutine machinery. I'll get back with an update next week.

bantonsson · 2023-01-05T16:30:15Z

@monosoul I've opened a new PR against your branch monosoul#2 that has all my changes, both tests and implementation. I've also tested it against CI here #4499

monosoul · 2023-01-05T16:39:12Z

@bantonsson thanks! I'll check it tomorrow

monosoul · 2023-01-06T12:37:24Z

@bantonsson I've left a few comments there. Tbh, now I think it's probably impossible/hard to achieve the behavior for lazily started coroutines I mentioned here. First of all because there are no guarantees a lazily started coroutine will ever be started or cancelled after it was instantiated.

vsingal-p · 2023-01-19T12:35:21Z

Hello folks. Is there any update on this PR?
I have gone through the comments and looks like lazy coroutines are somewhat tricky to handle

Can we keep them out of scope and get the base fix merged? We can mark lazy coroutines as a known bug, but atleast we would be able to solve a lot of other ones (everything basically)

bantonsson · 2023-01-24T10:14:59Z

@vsingal-p I've reviewed the fixes that @monosoul have done in his repo, and I think that with those fixes done, we could move this PR forward if we have the instrumentation as an opt-in until it's been tested and we know what the performance implications are.

vsingal-p · 2023-01-25T03:56:51Z

@vsingal-p I've reviewed the fixes that @monosoul have done in his repo, and I think that with those fixes done, we could move this PR forward if we have the instrumentation as an opt-in until it's been tested and we know what the performance implications are.

Sounds like a plan. @monosoul Let's get all the changes in and get this fix launched 🤞 And huge thanks for all your efforts 🙇

monosoul · 2023-01-25T17:46:32Z

Sorry for the delay on my end this time, been a bit busy lately. I'll try to implement the changes suggested by @bantonsson by the end of this week.

…utines-instrumentation

monosoul · 2023-01-26T22:58:44Z

@bantonsson I've updated the PR

bantonsson

Thanks @monosoul for a great contribution.

bantonsson · 2023-01-27T16:44:20Z

@monosoul Would you mind rewording your PR description to what the state of the instrumentation is now, and mention the flags for turning on the instrumentation? Also, could you switch out the use of fixes #... to solves #... since github auto-close will close the issues on merge and not on release.

monosoul · 2023-01-27T17:02:30Z

@bantonsson Done!

kwazii1231 · 2023-11-09T15:36:15Z

is it working correctly?
i mean, i'm hard to find out what i'm missing
i use RUN curl -Lso dd-java-agent.jar https://dtdg.co/latest-java-tracer
to use latest dd-trace-java for my application as datadog agent
and i use implementation("com.datadoghq:dd-trace-api:1.22.0") as gradle dependency.
i don't know what is the root cause that is reason my application can not correctly trace code that i use async of coroutine
anything other is same as what u describe.
if you have any clue to solve my situation.

monosoul · 2023-11-09T16:02:10Z

@kwazii1231 you need to also enable the feature, since it's behind a flag atm

kwazii1231 · 2023-11-09T16:07:28Z

@kwazii1231 you need to also enable the feature, since it's behind a flag atm

@monosoul

really thanks to reply very quickly
i set up DD_INTEGRATION_KOTLIN_COROUTINE_EXPERIMENTAL_ENABLED=true
to my container env_variable
and my docker file is seems like belows

FROM adoptopenjdk:11.0.11_9-jdk-hotspot as app

ARG APP_PROFILE=local
ENV APP_PROFILE=${APP_PROFILE}
TZ=Asia/Seoul

WORKDIR /app
RUN curl -Lso dd-java-agent.jar https://dtdg.co/latest-java-tracer

WORKDIR /app
COPY build/libs/* /app/

ENTRYPOINT java
-XX:+UseContainerSupport
-XX:InitialRAMPercentage=75
-XX:MaxRAMPercentage=75
-XshowSettings:vm
-Dfile.encoding=UTF-8
-Djava.net.preferIPv4Stack=true
-jar /app/api.jar
--spring.profiles.active=${APP_PROFILE}

is there any other things to do?
some trace seems like trace correctly but almost trace is not working correctly

monosoul · 2023-11-09T16:10:50Z

@kwazii1231 you need to add the agent to your jvm cmd line: -javaagent:/path/to/dd-java-agent.jar

See this documentation: https://docs.datadoghq.com/tracing/trace_collection/dd_libraries/java/?tab=springboot

kwazii1231 · 2023-11-09T16:12:50Z

@kwazii1231 you need to also enable the feature, since it's behind a flag atm

also did at common template of my helm chart like this

name: JAVA_TOOL_OPTIONS
value: -javaagent:/app/dd-java-agent.jar

only async statement for grpc api call trace is not correctly collected

kwazii1231 · 2023-11-10T07:34:09Z

@kwazii1231 you need to add the agent to your jvm cmd line: -javaagent:/path/to/dd-java-agent.jar

See this documentation: https://docs.datadoghq.com/tracing/trace_collection/dd_libraries/java/?tab=springboot

Hi @monosoul
finally i found the reason, and root cause is not from your implementation
but from our custom span append code that was written when this agent didn't support kotlin coroutine tracing.
so i used activeSpan from globalTracer for appending custom client span.
finally it works!
really thansk to reply very quickly that make me try to find out reason from our code.
have a nice day!

monosoul added 3 commits December 8, 2022 21:56

Configure kotlin-coroutines module to be compilable

45d2c12

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

Add a test case for kotlin coroutines tracing with suspension

df6d722

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

Add a naive Kotlin coroutines instrumentation

c7b4468

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

monosoul force-pushed the feature/kotlin-coroutines-instrumentation branch 2 times, most recently from 35df53a to b06ac78 Compare December 9, 2022 13:36

Rewrite using ManagedScope abstraction for scope stack management

a3512c8

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

monosoul force-pushed the feature/kotlin-coroutines-instrumentation branch from b06ac78 to a3512c8 Compare December 9, 2022 13:38

monosoul added 3 commits December 9, 2022 15:24

Configure muzzle

9553d24

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

Merge remote-tracking branch 'origin/master' into feature/kotlin-coro…

b001da9

…utines-instrumentation

Fix a typo

d0cbeac

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

monosoul force-pushed the feature/kotlin-coroutines-instrumentation branch 2 times, most recently from 5088230 to 8530f70 Compare December 9, 2022 15:22

ManagedScope -> ScopeState; delegateManagedScope -> newScopeState

079ab2c

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

monosoul force-pushed the feature/kotlin-coroutines-instrumentation branch from 8530f70 to 079ab2c Compare December 9, 2022 15:37

monosoul added 3 commits December 9, 2022 17:14

Improve createKotlinDirs task declaration to make sure it always works

e719abc

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

Add unit tests for ContinuableScopeState

66072b6

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

Exclude CustomScopeState

1a342eb

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

monosoul marked this pull request as ready for review December 9, 2022 19:20

monosoul requested a review from a team as a code owner December 9, 2022 19:20

Merge remote-tracking branch 'origin/master' into feature/kotlin-coro…

6aca8d1

…utines-instrumentation

Merge remote-tracking branch 'origin/master' into feature/kotlin-coro…

ec1ff19

…utines-instrumentation

Merge remote-tracking branch 'origin/master' into feature/kotlin-coro…

3a9b4f1

…utines-instrumentation Signed-off-by: monosoul <Kloz.Klaud@gmail.com> # Conflicts: # dd-trace-core/src/main/java/datadog/trace/core/scopemanager/ContinuableScopeManager.java

monosoul added 2 commits December 23, 2022 02:46

Extract ContinuationHandler into its own file

f5ba3d0

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

Extract closeScopeAndCancelContinuation method

fcb59ae

Signed-off-by: monosoul <Kloz.Klaud@gmail.com>

monosoul and others added 2 commits January 26, 2023 23:10

Handle lazily/eagerly started coroutines differently (#3)

2f062e6

Merge remote-tracking branch 'origin/master' into feature/kotlin-coro…

3923d69

…utines-instrumentation

monosoul requested a review from bantonsson January 27, 2023 12:25

bantonsson approved these changes Jan 27, 2023

View reviewed changes

bantonsson merged commit c1ad5f8 into DataDog:master Jan 27, 2023

bantonsson added this to the 1.7.0 milestone Jan 27, 2023

monosoul mentioned this pull request Jan 31, 2023

Instrumentation for Kotlin coroutines 1.5.0+ #4624

Merged

smola added the inst: others All other instrumentations label Feb 3, 2023

monosoul mentioned this pull request Feb 11, 2023

Fix IllegalStateException caused by Flows instrumentation in Kotlin coroutines #4719

Merged

joedj mentioned this pull request Sep 20, 2023

@Trace annotation does not work with kotlin suspending functions #5917

Open

monosoul deleted the feature/kotlin-coroutines-instrumentation branch December 18, 2023 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kotlin coroutines instrumentation #4405

Kotlin coroutines instrumentation #4405

monosoul commented Dec 8, 2022 •

edited

Loading

vsingal-p commented Dec 10, 2022

monosoul commented Dec 13, 2022

bantonsson commented Dec 14, 2022

bantonsson commented Dec 15, 2022

monosoul commented Dec 15, 2022 •

edited

Loading

bantonsson commented Dec 16, 2022

vsingal-p commented Dec 16, 2022

monosoul commented Dec 16, 2022

bantonsson commented Dec 29, 2022

bantonsson commented Jan 5, 2023

monosoul commented Jan 5, 2023

monosoul commented Jan 6, 2023

vsingal-p commented Jan 19, 2023

bantonsson commented Jan 24, 2023

vsingal-p commented Jan 25, 2023 •

edited

Loading

monosoul commented Jan 25, 2023

monosoul commented Jan 26, 2023

bantonsson left a comment

bantonsson commented Jan 27, 2023

monosoul commented Jan 27, 2023

kwazii1231 commented Nov 9, 2023

monosoul commented Nov 9, 2023

kwazii1231 commented Nov 9, 2023 •

edited

Loading

monosoul commented Nov 9, 2023

kwazii1231 commented Nov 9, 2023 •

edited

Loading

kwazii1231 commented Nov 10, 2023 •

edited

Loading

Kotlin coroutines instrumentation #4405

Kotlin coroutines instrumentation #4405

Conversation

monosoul commented Dec 8, 2022 • edited Loading

What Does This Do

Motivation

Additional Notes

Feature flags

vsingal-p commented Dec 10, 2022

monosoul commented Dec 13, 2022

bantonsson commented Dec 14, 2022

bantonsson commented Dec 15, 2022

monosoul commented Dec 15, 2022 • edited Loading

bantonsson commented Dec 16, 2022

vsingal-p commented Dec 16, 2022

monosoul commented Dec 16, 2022

bantonsson commented Dec 29, 2022

bantonsson commented Jan 5, 2023

monosoul commented Jan 5, 2023

monosoul commented Jan 6, 2023

vsingal-p commented Jan 19, 2023

bantonsson commented Jan 24, 2023

vsingal-p commented Jan 25, 2023 • edited Loading

monosoul commented Jan 25, 2023

monosoul commented Jan 26, 2023

bantonsson left a comment

Choose a reason for hiding this comment

bantonsson commented Jan 27, 2023

monosoul commented Jan 27, 2023

kwazii1231 commented Nov 9, 2023

monosoul commented Nov 9, 2023

kwazii1231 commented Nov 9, 2023 • edited Loading

monosoul commented Nov 9, 2023

kwazii1231 commented Nov 9, 2023 • edited Loading

kwazii1231 commented Nov 10, 2023 • edited Loading

monosoul commented Dec 8, 2022 •

edited

Loading

monosoul commented Dec 15, 2022 •

edited

Loading

vsingal-p commented Jan 25, 2023 •

edited

Loading

kwazii1231 commented Nov 9, 2023 •

edited

Loading

kwazii1231 commented Nov 9, 2023 •

edited

Loading

kwazii1231 commented Nov 10, 2023 •

edited

Loading