Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability for invokedynamic InstrumentationModules to inject proxies #9565

Merged
merged 21 commits into from
Oct 19, 2023

Conversation

JonasKunz
Copy link
Contributor

@JonasKunz JonasKunz commented Sep 27, 2023

Follow up of #9502, part of #8999 .

This PR only allows the injection of the generated class, injection of it's bytecode as a resource (required for spring-boot autoconfigurations) is not yet support.

I initially planned on migrating the AWS instrumentation to indy to prove that this feature works as intended.
It works, but it currently looks very hacky, see this commit. The reasons are:

  • Muzzle parses the execution.interceptors and uses the referenced class as root: If you refere to a runtime-generated proxy with a new name there, the plugin "crashes"
    • For invokedynamic modules those files should not be parsed by Muzzle. Instead the newly added InstrumentationModule.injectClasses() method should be used
  • The helper classes of the instrumentation module are shaded, which is a bit confusing when using the newly added injectClasses() method because you have to refer to the shaded name
    • Where does this shading happen? I assume in the otel.javaagent-instrumentation gradle plugin? Where can I find it's source?
    • I think we should already be able to get rid of the shading of instrumentation modules for invokedynamic modules. Any pointers on how to realize this or maybe even a contribution from someone else would be very appreciated!

In the linked commit I've worked around the issue by

  • using the same name for proxy and proxied class, which works but looks very confusing
  • refering to the shaded class name

That's why I think it is "ugly" and not worth including yet, but I don't have a strong opinion here.

@JonasKunz JonasKunz requested a review from a team September 27, 2023 12:10
@laurit
Copy link
Contributor

laurit commented Sep 29, 2023

  • Muzzle parses the execution.interceptors and uses the referenced class as root: If you refere to a runtime-generated proxy with a new name there, the plugin "crashes"

what do you mean by crashes? in my vocabulary crashes means that there is a hs_err_pid file

  • For invokedynamic modules those files should not be parsed by Muzzle. Instead the newly added InstrumentationModule.injectClasses() method should be used

Muzzle just collects which helper classes are used in the given instrumentation. There should not be a big difference whether you are using our current or the new indy module, you still need to know which classes are used an inject/load them. If you need to account for behavioral differences between the current and the new indy module just add apis that allow you to say, hey when running with indy skip this.

  • The helper classes of the instrumentation module are shaded, which is a bit confusing when using the newly added injectClasses() method because you have to refer to the shaded name

Generally the shading plugins are quite smart and will rename class names that you have has text. I think at this stage we are still just prototyping and only need to prove that this can be done. We can take care of the details later.

  • Where does this shading happen? I assume in the otel.javaagent-instrumentation gradle plugin? Where can I find it's source?

What you are probably looking for is this line

relocate("io.opentelemetry.instrumentation", "io.opentelemetry.javaagent.shaded.instrumentation") {
We shade library instrumentation classes that can also be present in the application, we don't shade instrumentation classes that are only in the agent.

  • I think we should already be able to get rid of the shading of instrumentation modules for invokedynamic modules. Any pointers on how to realize this or maybe even a contribution from someone else would be very appreciated!

I think that shading is really the last of the concerns. Ideally you want to support both approaches as long as possible with feature flags to avoid having to work on a branch that you constantly need to keep up to date with changes from main. I'd suggest you to leave the shading as it is for now, if you want to see how it works without shading just use the disableShadowRelocate flag (afaik this flag isn't well tested, there is a possibility that stuff will break). Also note that currently the indy module class loader does a poor job at isolating application and agent classes. Instrumentations like okhttp, kotlin where agent contains the same classes as application get LinkageErrors.

using the same name for proxy and proxied class, which works but looks very confusing
I think it is fine. To use a different name I guess we'd need to also instrument whatever is looking for the class and if we already have to do that we probably wouldn't need to use the proxy at all.


public enum InjectionMode {
CLASS_ONLY
// TODO: implement the modes RESOURCE_ONLY and CLASS_AND_RESOURCE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For proxy classes I think we could expose the original class bytes and we already have the code to do that. Implementing a custom URLStreamHandler is also fine if we want to expose the actual proxy bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to provide the actual bytecode for the proxies. Applications parsing the bytecode might otherwise trip over references to non-existing types from the method bodies. In addition there might be some non-public methods and fields present in the bytecode but not in the proxy-class, which could also cause hard to debug issues.

I've already tried the approach with a custom URLStreamHandler on a separate branch, it is not much code and should be easy to maintain.

}

public static void setHelperInjectorListener(HelperInjectorListener listener) {
helperInjectorListener = listener;
}

private Map<String, Supplier<byte[]>> getHelperMap() {
private Map<String, Supplier<byte[]>> getHelperMap(ClassLoader targetClassloader) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the class loader here feels unnecessary. As far as I understand the class loader you need here is either agent class loader or extension class loader, it is ok to keep reference to these, they won't go away anyway. If you are currently using instrumentation module class loader here then consider whether your really need to. If you need instrumentation class loader because the injected class may reference some type that is only present in the application and byte-buddy blows up when it is not available then perhaps it would make more sense to generate the proxy with asm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need instrumentation class loader because the injected class may reference some type that is only present in the application and byte-buddy blows up when it is not available then perhaps it would make more sense to generate the proxy with asm.

That is precisely the case, I've even ensured that this is covered by the tests: The MyProxy class extends MyProxySuperclass which is only available in the instrumented application classloader.

We could remove the need for having the classloader at hand when generating the proxies, but I went for the other route for the following reasons:

  • I would assume that the ByteBuddy-based proxy generator is a lot easier to read and maintain than an ASM one. I'd expect the ASM proxy generator to be more brittle.
  • This would not allow us to override & delegate inherited methods by the proxy. We don't do this currently, but I would suspect it could become relevant in the future. See also this test-case for reference.

What problems do you see with providing the instrumented app classloader when injecting the classes?

@trask
Copy link
Member

trask commented Sep 29, 2023

I'd suggest you to leave the shading as it is for now

👍 I support not spending time (or adding complexity) in order to remove all shading at this point. this can always be explored / tackled later on.

@JonasKunz
Copy link
Contributor Author

JonasKunz commented Oct 2, 2023

  • Muzzle parses the execution.interceptors and uses the referenced class as root: If you refere to a runtime-generated proxy with a new name there, the plugin "crashes"

what do you mean by crashes? in my vocabulary crashes means that there is a hs_err_pid file

Sorry, I meant that the gradle build fails with an exception:

./gradlew clean assemble --stacktrace
> Task :instrumentation:aws-sdk:aws-sdk-2.2:javaagent:byteBuddyJava FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':instrumentation:aws-sdk:aws-sdk-2.2:javaagent:byteBuddyJava'.
> Failed to transform class files in /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/aws-sdk/aws-sdk-2.2/javaagent/build/classes/java/mainraw

* Try:
> Run with --info or --debug option to get more log output.
> Get more help at https://help.gradle.org.

* Exception is:
org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':instrumentation:aws-sdk:aws-sdk-2.2:javaagent:byteBuddyJava'.
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.lambda$executeIfValid$1(ExecuteActionsTaskExecuter.java:149)
        ...
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
Caused by: java.lang.IllegalStateException: Failed to transform class files in /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/aws-sdk/aws-sdk-2.2/javaagent/build/classes/java/mainraw
        at net.bytebuddy.build.gradle.AbstractByteBuddyTask.doApply(AbstractByteBuddyTask.java:463)
        ...
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
Caused by: java.lang.UnsupportedOperationException: onError
        at net.bytebuddy.build.Plugin$Engine$ErrorHandler$Failing$1.onError(Plugin.java:1186)
        ...
        ... 157 more
  • For invokedynamic modules those files should not be parsed by Muzzle. Instead the newly added InstrumentationModule.injectClasses() method should be used

Muzzle just collects which helper classes are used in the given instrumentation. There should not be a big difference whether you are using our current or the new indy module, you still need to know which classes are used an inject/load them. If you need to account for behavioral differences between the current and the new indy module just add apis that allow you to say, hey when running with indy skip this.

I think for now we can stick to using the same name for the injected proxy like you suggested.

  • The helper classes of the instrumentation module are shaded, which is a bit confusing when using the newly added injectClasses() method because you have to refer to the shaded name

Generally the shading plugins are quite smart and will rename class names that you have has text. I think at this stage we are still just prototyping and only need to prove that this can be done. We can take care of the details later.

Looks like I had something messed up locally first. Things work just like you described here. I'm able to refer to the unshaded class names for the proxyBuilder() call and those are properly shaded.

  • I think we should already be able to get rid of the shading of instrumentation modules for invokedynamic modules. Any pointers on how to realize this or maybe even a contribution from someone else would be very appreciated!

I think that shading is really the last of the concerns. Ideally you want to support both approaches as long as possible with feature flags to avoid having to work on a branch that you constantly need to keep up to date with changes from main. I'd suggest you to leave the shading as it is for now, if you want to see how it works without shading just use the disableShadowRelocate flag (afaik this flag isn't well tested, there is a possibility that stuff will break).

Agree! Shading does not interfer with the proxying as I initially thought, so we are good here.

Also note that currently the indy module class loader does a poor job at isolating application and agent classes. Instrumentations like okhttp, kotlin where agent contains the same classes as application get LinkageErrors.

Yes that's something I want to implement soon™ . InstrumentationModules should have the capability to force linkage with selected classes/packages from the instrumented classloader for cases where they are present in the agent-classloader too. This kind of is a poor man's solution in comparison to using the "ideal" classloader hierarchy I described here, but much easier to implement.

using the same name for proxy and proxied class, which works but looks very confusing
I think it is fine. To use a different name I guess we'd need to also instrument whatever is looking for the class and if we already have to do that we probably wouldn't need to use the proxy at all.

Agree that it is fine for now. For the longterm this might however lead to confusing stacktraces where the same class appears twice, but it is actually two differen classes (proxy and the delegate).

@JonasKunz JonasKunz force-pushed the instrumentation-proxies branch from 35e2b60 to 5adf59e Compare October 2, 2023 12:33
@SuppressWarnings("unchecked")
public List<String> getAdditionalHelperClassNames() {
if (isIndyModule()) {
// With the invokedynamic approach, the SnsInstrumentationModule and SqsInstrumentationModule
Copy link
Contributor

@laurit laurit Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is entirely correct. The reason why there are multiple modules is that muzzle check is done per module. Having sqs and sns instrumentation allows running muzzle checks for these separately so that the main instrumentation module can still apply even if sqs and sns dependencies are missing. Why this works for you when you dump these into one module is because you have disabled the muzzle checks for the indy instrumentation modules, if you'd add back the muzzle check code then I suspect doing this would cause muzzle failures (or if it doesn't then runtime failures when previously there would have been a muzzle failure). I think that using aws instrumentation to test the proxy code was an unfortunate complication because the multimodule thing going on there complicates things. These 3 modules are currently interacting so that if the muzzle checks for the optional modules pass they'll inject the classes that the main module will dynamically discover them. With indy these 3 module are isolated from each other so the trick of dynamically discovering classes injected by another module won't work. The easiest way around this is probably to have one instrumentation class loader per application class loader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could choose some other instrumentation besides aws for testing the proxy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 modules are currently interacting so that if the muzzle checks for the optional modules pass they'll inject the classes that the main module will dynamically discover them. With indy these 3 module are isolated from each other so the trick of dynamically discovering classes injected by another module won't work.

I see, thanks for explaining this!
My prior understanding was that the AWS instrumentation is simply about injecting the library instrumentation as is and using the SQS / SNS discovery built into that library instrumentation. However, for the agent instrumentation we also want to protect agains the case of an SQS / SNS version mismatch, correct?

The easiest way around this is probably to have one instrumentation class loader per application class loader.

Sounds good, but I'd propose that the default should be that each InstrumentationModule gets its own classloader. I would expect that in most cases, especially with external extensions, this isolation is a feature, not a bug. This allows extensions to freely use utility libraries (e.g. apache commons, guava) without having to fear collisions.

I'd propose for InstrumentationModules to optionally declare a "module group name". Modules with the same group name instrumenting the same app-classloader would get put into the same InstrumentationModuleClassLoader. What do you think about this approach?

If you agree, I'd shortly outline this as a TODO comment in the AWS module so that we don't forget about.

Or you could choose some other instrumentation besides aws for testing the proxy.

Agree, I'll revert the AWS changes and choose a different instrumentation to include in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'll revert the AWS changes and choose a different instrumentation to include in this PR.

So I had a look at both the apache-dubbo and azur instrumentation and noticed that their tests don't fail currently with testIndy even though they should? I tried messing with the instrumentations (e.g. changing the resource paths) and the tests still were green. I'm thinking that something must be wrong with the tests there?

I decided to use the log4j instrumentation in this PR, because that one actually fails without the proxying with indy enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least with dubbo the issue is that the javaagent instrumentation has implementation(project(":instrumentation:apache-dubbo-2.7:library-autoconfigure")) which leaks over into tests so the tests will pass even without the javaagent instrumentation because they just pick up the manual instrumentation library that gets automatically configured

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a PR to fix dubbo tests, azure tests (ok I ran only azure-core-1.14) fail for me, perhaps you didn't notice that they have

  @Override
  public boolean isIndyModule() {
    return false;
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I ran only azure-core-1.14

How did you run those, I feel like I'm missing something.

What I did was:

  • Remove the isIndyModule() override from the 1.14 AzureSdkInstrumentationModule
  • Run the 1.14 AzureSdkTest from the IDE with -PtestIndy=true and verified that the module is actually loaded as IndyModule -> the test still passes

Btw what is the correct way of running selected tests from the command line?
I tried ./gradlew :instrumentation:azure-core:azure-core-1.14:javaagent:test -PtestIndy=true -i but that seems to just pick up recent test result from some kind of cache, even if I execute a clean beforehand:

> Task :instrumentation:azure-core:azure-core-1.14:javaagent:test FROM-CACHE
Custom actions are attached to task ':instrumentation:azure-core:azure-core-1.14:javaagent:test'.
Build cache key for task ':instrumentation:azure-core:azure-core-1.14:javaagent:test' is fd9f198ac1e6354f6d8e4f302611bd17
Task ':instrumentation:azure-core:azure-core-1.14:javaagent:test' is not up-to-date because:
  Output property 'binaryResultsDirectory' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/test-results/test/binary has been removed.
  Output property 'binaryResultsDirectory' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/test-results/test/binary/output.bin has been removed.
  Output property 'binaryResultsDirectory' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/test-results/test/binary/output.bin.idx has been removed.
  Output property 'binaryResultsDirectory' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/test-results/test/binary/results.bin has been removed.
  Output property 'reports.enabledReports.html.outputLocation' file /Users/jonas/git/otel/opentelemetry-java-instrumentation/instrumentation/azure-core/azure-core-1.14/javaagent/build/reports/tests/test has been removed.
Loaded cache entry for task ':instrumentation:azure-core:azure-core-1.14:javaagent:test' with cache key fd9f198ac1e6354f6d8e4f302611bd17
Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD]
Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD]
Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD]
Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD]
Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD]
Could not execute [report metric STATISTICS_COLLECT_METRICS_OVERHEAD]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From command line I think you have to at least do :instrumentation:azure-core:azure-core-1.14:javaagent:cleanTest before and then run :instrumentation:azure-core:azure-core-1.14:javaagent:test --no-build-cache
I retested and it still fails for me as expected when running from ide. I also removed the isIndyModule method from AzureSdkInstrumentationModule and added -PtestIndy=true to the run options in idea. I ran the test on main not this branch if it matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test on main not this branch if it matters.

Looks like we have a winner there, that was the difference to make them fail for me too.

I think this can be explained by InstrumentationModule.registerHelperResources being non-functional for indy-modules on main. In this PR that functionally has been added back.

So for this PR the azure tests should fail because the classes referenced by the injected resources are not supposed to be present in the instrumented classloader, but apparently they are.
I suspect that this then is a testing classpath issue similar to dubbo.

This shouldn't block us on this PR. I'll try to verify my hypothesis and try to open a PR doing the same thing for azure what you did for dubbo.

@JonasKunz JonasKunz changed the title Add capabiltiy for invokedynamic InstrumentationModules to inject proxies Add capability for invokedynamic InstrumentationModules to inject proxies Oct 9, 2023
# Conflicts:
#	testing-common/integration-tests/src/test/java/indy/IndyInstrumentationTest.java
@trask
Copy link
Member

trask commented Oct 18, 2023

@JonasKunz can you give a walkthrough of this PR in the SIG meeting tomorrow? thx

@trask trask merged commit 2d4d010 into open-telemetry:main Oct 19, 2023
49 checks passed
@trask
Copy link
Member

trask commented Oct 19, 2023

thanks @JonasKunz and @laurit ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants