Add capability for invokedynamic InstrumentationModules to inject proxies #9565

JonasKunz · 2023-09-27T12:10:51Z

Follow up of #9502, part of #8999 .

This PR only allows the injection of the generated class, injection of it's bytecode as a resource (required for spring-boot autoconfigurations) is not yet support.

I initially planned on migrating the AWS instrumentation to indy to prove that this feature works as intended.
It works, but it currently looks very hacky, see this commit. The reasons are:

Muzzle parses the execution.interceptors and uses the referenced class as root: If you refere to a runtime-generated proxy with a new name there, the plugin "crashes"
- For invokedynamic modules those files should not be parsed by Muzzle. Instead the newly added InstrumentationModule.injectClasses() method should be used
The helper classes of the instrumentation module are shaded, which is a bit confusing when using the newly added injectClasses() method because you have to refer to the shaded name
- Where does this shading happen? I assume in the otel.javaagent-instrumentation gradle plugin? Where can I find it's source?
- I think we should already be able to get rid of the shading of instrumentation modules for invokedynamic modules. Any pointers on how to realize this or maybe even a contribution from someone else would be very appreciated!

In the linked commit I've worked around the issue by

using the same name for proxy and proxied class, which works but looks very confusing
refering to the shaded class name

That's why I think it is "ugly" and not worth including yet, but I don't have a strong opinion here.

laurit · 2023-09-29T09:20:16Z

Muzzle parses the execution.interceptors and uses the referenced class as root: If you refere to a runtime-generated proxy with a new name there, the plugin "crashes"

what do you mean by crashes? in my vocabulary crashes means that there is a hs_err_pid file

For invokedynamic modules those files should not be parsed by Muzzle. Instead the newly added InstrumentationModule.injectClasses() method should be used

Muzzle just collects which helper classes are used in the given instrumentation. There should not be a big difference whether you are using our current or the new indy module, you still need to know which classes are used an inject/load them. If you need to account for behavioral differences between the current and the new indy module just add apis that allow you to say, hey when running with indy skip this.

The helper classes of the instrumentation module are shaded, which is a bit confusing when using the newly added injectClasses() method because you have to refer to the shaded name

Generally the shading plugins are quite smart and will rename class names that you have has text. I think at this stage we are still just prototyping and only need to prove that this can be done. We can take care of the details later.

Where does this shading happen? I assume in the otel.javaagent-instrumentation gradle plugin? Where can I find it's source?

What you are probably looking for is this line

opentelemetry-java-instrumentation/conventions/src/main/kotlin/io.opentelemetry.instrumentation.javaagent-shadowing.gradle.kts

Line 22 in ff5e0a4

    
           relocate("io.opentelemetry.instrumentation", "io.opentelemetry.javaagent.shaded.instrumentation") {

We shade library instrumentation classes that can also be present in the application, we don't shade instrumentation classes that are only in the agent.

I think we should already be able to get rid of the shading of instrumentation modules for invokedynamic modules. Any pointers on how to realize this or maybe even a contribution from someone else would be very appreciated!

I think that shading is really the last of the concerns. Ideally you want to support both approaches as long as possible with feature flags to avoid having to work on a branch that you constantly need to keep up to date with changes from main. I'd suggest you to leave the shading as it is for now, if you want to see how it works without shading just use the disableShadowRelocate flag (afaik this flag isn't well tested, there is a possibility that stuff will break). Also note that currently the indy module class loader does a poor job at isolating application and agent classes. Instrumentations like okhttp, kotlin where agent contains the same classes as application get LinkageErrors.

using the same name for proxy and proxied class, which works but looks very confusing
I think it is fine. To use a different name I guess we'd need to also instrument whatever is looking for the class and if we already have to do that we probably wouldn't need to use the proxy at all.

...rc/main/java/io/opentelemetry/javaagent/extension/instrumentation/InstrumentationModule.java

.../main/java/io/opentelemetry/javaagent/extension/instrumentation/injection/ClassInjector.java

laurit · 2023-09-29T14:18:03Z

.../main/java/io/opentelemetry/javaagent/extension/instrumentation/injection/InjectionMode.java

+
+public enum InjectionMode {
+  CLASS_ONLY
+  // TODO: implement the modes RESOURCE_ONLY and CLASS_AND_RESOURCE


For proxy classes I think we could expose the original class bytes and we already have the code to do that. Implementing a custom URLStreamHandler is also fine if we want to expose the actual proxy bytes.

I'd prefer to provide the actual bytecode for the proxies. Applications parsing the bytecode might otherwise trip over references to non-existing types from the method bodies. In addition there might be some non-public methods and fields present in the bytecode but not in the proxy-class, which could also cause hard to debug issues.

I've already tried the approach with a custom URLStreamHandler on a separate branch, it is not much code and should be easy to maintain.

laurit · 2023-09-29T14:30:09Z

muzzle/src/main/java/io/opentelemetry/javaagent/tooling/HelperInjector.java

  }

  public static void setHelperInjectorListener(HelperInjectorListener listener) {
    helperInjectorListener = listener;
  }

-  private Map<String, Supplier<byte[]>> getHelperMap() {
+  private Map<String, Supplier<byte[]>> getHelperMap(ClassLoader targetClassloader) {


Passing the class loader here feels unnecessary. As far as I understand the class loader you need here is either agent class loader or extension class loader, it is ok to keep reference to these, they won't go away anyway. If you are currently using instrumentation module class loader here then consider whether your really need to. If you need instrumentation class loader because the injected class may reference some type that is only present in the application and byte-buddy blows up when it is not available then perhaps it would make more sense to generate the proxy with asm.

If you need instrumentation class loader because the injected class may reference some type that is only present in the application and byte-buddy blows up when it is not available then perhaps it would make more sense to generate the proxy with asm.

That is precisely the case, I've even ensured that this is covered by the tests: The MyProxy class extends MyProxySuperclass which is only available in the instrumented application classloader.

We could remove the need for having the classloader at hand when generating the proxies, but I went for the other route for the following reasons:

I would assume that the ByteBuddy-based proxy generator is a lot easier to read and maintain than an ASM one. I'd expect the ASM proxy generator to be more brittle.

This would not allow us to override & delegate inherited methods by the proxy. We don't do this currently, but I would suspect it could become relevant in the future. See also this test-case for reference.

What problems do you see with providing the instrumented app classloader when injecting the classes?

trask · 2023-09-29T18:22:41Z

I'd suggest you to leave the shading as it is for now

👍 I support not spending time (or adding complexity) in order to remove all shading at this point. this can always be explored / tackled later on.

JonasKunz · 2023-10-02T08:31:19Z

Muzzle parses the execution.interceptors and uses the referenced class as root: If you refere to a runtime-generated proxy with a new name there, the plugin "crashes"

what do you mean by crashes? in my vocabulary crashes means that there is a hs_err_pid file

Sorry, I meant that the gradle build fails with an exception:

./gradlew clean assemble --stacktrace
> Task :instrumentation:aws-sdk:aws-sdk-2.2:javaagent:byteBuddyJava FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':instrumentation:aws-sdk:aws-sdk-2.2:javaagent:byteBuddyJava'.
> Failed to transform class files in /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/aws-sdk/aws-sdk-2.2/javaagent/build/classes/java/mainraw

* Try:
> Run with --info or --debug option to get more log output.
> Get more help at https://help.gradle.org.

* Exception is:
org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':instrumentation:aws-sdk:aws-sdk-2.2:javaagent:byteBuddyJava'.
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.lambda$executeIfValid$1(ExecuteActionsTaskExecuter.java:149)
        ...
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
Caused by: java.lang.IllegalStateException: Failed to transform class files in /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/aws-sdk/aws-sdk-2.2/javaagent/build/classes/java/mainraw
        at net.bytebuddy.build.gradle.AbstractByteBuddyTask.doApply(AbstractByteBuddyTask.java:463)
        ...
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
Caused by: java.lang.UnsupportedOperationException: onError
        at net.bytebuddy.build.Plugin$Engine$ErrorHandler$Failing$1.onError(Plugin.java:1186)
        ...
        ... 157 more

For invokedynamic modules those files should not be parsed by Muzzle. Instead the newly added InstrumentationModule.injectClasses() method should be used

Muzzle just collects which helper classes are used in the given instrumentation. There should not be a big difference whether you are using our current or the new indy module, you still need to know which classes are used an inject/load them. If you need to account for behavioral differences between the current and the new indy module just add apis that allow you to say, hey when running with indy skip this.

I think for now we can stick to using the same name for the injected proxy like you suggested.

The helper classes of the instrumentation module are shaded, which is a bit confusing when using the newly added injectClasses() method because you have to refer to the shaded name

Generally the shading plugins are quite smart and will rename class names that you have has text. I think at this stage we are still just prototyping and only need to prove that this can be done. We can take care of the details later.

Looks like I had something messed up locally first. Things work just like you described here. I'm able to refer to the unshaded class names for the proxyBuilder() call and those are properly shaded.

I think we should already be able to get rid of the shading of instrumentation modules for invokedynamic modules. Any pointers on how to realize this or maybe even a contribution from someone else would be very appreciated!

I think that shading is really the last of the concerns. Ideally you want to support both approaches as long as possible with feature flags to avoid having to work on a branch that you constantly need to keep up to date with changes from main. I'd suggest you to leave the shading as it is for now, if you want to see how it works without shading just use the disableShadowRelocate flag (afaik this flag isn't well tested, there is a possibility that stuff will break).

Agree! Shading does not interfer with the proxying as I initially thought, so we are good here.

Also note that currently the indy module class loader does a poor job at isolating application and agent classes. Instrumentations like okhttp, kotlin where agent contains the same classes as application get LinkageErrors.

Yes that's something I want to implement soon™ . InstrumentationModules should have the capability to force linkage with selected classes/packages from the instrumented classloader for cases where they are present in the agent-classloader too. This kind of is a poor man's solution in comparison to using the "ideal" classloader hierarchy I described here, but much easier to implement.

using the same name for proxy and proxied class, which works but looks very confusing
I think it is fine. To use a different name I guess we'd need to also instrument whatever is looking for the class and if we already have to do that we probably wouldn't need to use the proxy at all.

Agree that it is fine for now. For the longterm this might however lead to confusing stacktraces where the same class appears twice, but it is actually two differen classes (proxy and the delegate).

...opentelemetry/javaagent/instrumentation/awssdk/v2_2/AbstractAwsSdkInstrumentationModule.java

...ry/javaagent/extension/instrumentation/internal/injection/ExtendedInstrumentationModule.java

...a/io/opentelemetry/javaagent/extension/instrumentation/internal/injection/ClassInjector.java

...rc/main/java/io/opentelemetry/javaagent/tooling/instrumentation/indy/IndyModuleTypePool.java

laurit · 2023-10-03T08:50:49Z

...java/io/opentelemetry/javaagent/instrumentation/awssdk/v2_2/AwsSdkInstrumentationModule.java

+  @SuppressWarnings("unchecked")
+  public List<String> getAdditionalHelperClassNames() {
+    if (isIndyModule()) {
+      // With the invokedynamic approach, the SnsInstrumentationModule and SqsInstrumentationModule


I don't think this is entirely correct. The reason why there are multiple modules is that muzzle check is done per module. Having sqs and sns instrumentation allows running muzzle checks for these separately so that the main instrumentation module can still apply even if sqs and sns dependencies are missing. Why this works for you when you dump these into one module is because you have disabled the muzzle checks for the indy instrumentation modules, if you'd add back the muzzle check code then I suspect doing this would cause muzzle failures (or if it doesn't then runtime failures when previously there would have been a muzzle failure). I think that using aws instrumentation to test the proxy code was an unfortunate complication because the multimodule thing going on there complicates things. These 3 modules are currently interacting so that if the muzzle checks for the optional modules pass they'll inject the classes that the main module will dynamically discover them. With indy these 3 module are isolated from each other so the trick of dynamically discovering classes injected by another module won't work. The easiest way around this is probably to have one instrumentation class loader per application class loader.

Or you could choose some other instrumentation besides aws for testing the proxy.

These 3 modules are currently interacting so that if the muzzle checks for the optional modules pass they'll inject the classes that the main module will dynamically discover them. With indy these 3 module are isolated from each other so the trick of dynamically discovering classes injected by another module won't work.

I see, thanks for explaining this!
My prior understanding was that the AWS instrumentation is simply about injecting the library instrumentation as is and using the SQS / SNS discovery built into that library instrumentation. However, for the agent instrumentation we also want to protect agains the case of an SQS / SNS version mismatch, correct?

The easiest way around this is probably to have one instrumentation class loader per application class loader.

Sounds good, but I'd propose that the default should be that each InstrumentationModule gets its own classloader. I would expect that in most cases, especially with external extensions, this isolation is a feature, not a bug. This allows extensions to freely use utility libraries (e.g. apache commons, guava) without having to fear collisions.

I'd propose for InstrumentationModules to optionally declare a "module group name". Modules with the same group name instrumenting the same app-classloader would get put into the same InstrumentationModuleClassLoader. What do you think about this approach?

If you agree, I'd shortly outline this as a TODO comment in the AWS module so that we don't forget about.

Or you could choose some other instrumentation besides aws for testing the proxy.

Agree, I'll revert the AWS changes and choose a different instrumentation to include in this PR.

Agree, I'll revert the AWS changes and choose a different instrumentation to include in this PR.

So I had a look at both the apache-dubbo and azur instrumentation and noticed that their tests don't fail currently with testIndy even though they should? I tried messing with the instrumentations (e.g. changing the resource paths) and the tests still were green. I'm thinking that something must be wrong with the tests there?

I decided to use the log4j instrumentation in this PR, because that one actually fails without the proxying with indy enabled.

at least with dubbo the issue is that the javaagent instrumentation has implementation(project(":instrumentation:apache-dubbo-2.7:library-autoconfigure")) which leaks over into tests so the tests will pass even without the javaagent instrumentation because they just pick up the manual instrumentation library that gets automatically configured

I created a PR to fix dubbo tests, azure tests (ok I ran only azure-core-1.14) fail for me, perhaps you didn't notice that they have

@Override public boolean isIndyModule() { return false; }

ok I ran only azure-core-1.14

How did you run those, I feel like I'm missing something.

What I did was:

Remove the isIndyModule() override from the 1.14 AzureSdkInstrumentationModule

Run the 1.14 AzureSdkTest from the IDE with -PtestIndy=true and verified that the module is actually loaded as IndyModule -> the test still passes

Btw what is the correct way of running selected tests from the command line?
I tried ./gradlew :instrumentation:azure-core:azure-core-1.14:javaagent:test -PtestIndy=true -i but that seems to just pick up recent test result from some kind of cache, even if I execute a clean beforehand:

> Task :instrumentation:azure-core:azure-core-1.14:javaagent:test FROM-CACHE Custom actions are attached to task ':instrumentation:azure-core:azure-core-1.14:javaagent:test'. Build cache key for task ':instrumentation:azure-core:azure-core-1.14:javaagent:test' is fd9f198ac1e6354f6d8e4f302611bd17 Task ':instrumentation:azure-core:azure-core-1.14:javaagent:test' is not up-to-date because: Output property 'binaryResultsDirectory' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/test-results/test/binary has been removed. Output property 'binaryResultsDirectory' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/test-results/test/binary/output.bin has been removed. Output property 'binaryResultsDirectory' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/test-results/test/binary/output.bin.idx has been removed. Output property 'binaryResultsDirectory' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/test-results/test/binary/results.bin has been removed. Output property 'reports.enabledReports.html.outputLocation' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/reports/tests/test has been removed. Loaded cache entry for task ':instrumentation:azure-core:azure-core-1.14:javaagent:test' with cache key fd9f198ac1e6354f6d8e4f302611bd17 Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD] Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD] Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD] Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD] Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD] Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD]

From command line I think you have to at least do :instrumentation:azure-core:azure-core-1.14:javaagent:cleanTest before and then run :instrumentation:azure-core:azure-core-1.14:javaagent:test --no-build-cache
I retested and it still fails for me as expected when running from ide. I also removed the isIndyModule method from AzureSdkInstrumentationModule and added -PtestIndy=true to the run options in idea. I ran the test on main not this branch if it matters.

I ran the test on main not this branch if it matters.

Looks like we have a winner there, that was the difference to make them fail for me too.

I think this can be explained by InstrumentationModule.registerHelperResources being non-functional for indy-modules on main. In this PR that functionally has been added back.

So for this PR the azure tests should fail because the classes referenced by the injected resources are not supposed to be present in the instrumented classloader, but apparently they are.
I suspect that this then is a testing classpath issue similar to dubbo.

This shouldn't block us on this PR. I'll try to verify my hypothesis and try to open a PR doing the same thing for azure what you did for dubbo.

# Conflicts: # testing-common/integration-tests/src/test/java/indy/IndyInstrumentationTest.java

...rc/main/java/io/opentelemetry/javaagent/tooling/instrumentation/indy/IndyModuleTypePool.java

trask · 2023-10-18T19:13:01Z

@JonasKunz can you give a walkthrough of this PR in the SIG meeting tomorrow? thx

trask · 2023-10-19T17:11:27Z

thanks @JonasKunz and @laurit ❤️

JonasKunz added 5 commits September 26, 2023 10:45

Implemented invokedynamic proxy injection for InstrumentationModules

25a8763

Merge remote-tracking branch 'otel/main' into instrumentation-proxies

c3a6a13

Comments and spotless

7a03962

Remove unnecessary subpackage

9209cc6

Added missing javadoc

a54e730

JonasKunz requested a review from a team September 27, 2023 12:10

spotless fixes

57da000

jackshirazi mentioned this pull request Sep 28, 2023

Proposal to contribute invokedynamic based instrumentation mechanism #8999

Open

laurit reviewed Sep 29, 2023

View reviewed changes

Enable AWS SDK instrumentation for indy via proxy mechanism

1aa563e

JonasKunz added 3 commits October 2, 2023 11:33

Moved injection to new ExtendedInstrumentationModule

d48563c

Javadoc review fixes

43fc618

Merge remote-tracking branch 'otel/main' into instrumentation-proxies

80dee2f

JonasKunz commented Oct 2, 2023

View reviewed changes

...opentelemetry/javaagent/instrumentation/awssdk/v2_2/AbstractAwsSdkInstrumentationModule.java Outdated Show resolved Hide resolved

laurit added the test indy label Oct 2, 2023

Fix AWS tests for indy, enable for testing with indy flag

5adf59e

JonasKunz force-pushed the instrumentation-proxies branch from 35e2b60 to 5adf59e Compare October 2, 2023 12:33