You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Loading of shims in 21.10 from Parallel Worlds jar took several iterations to get right because we try to manipulate classloaders relatively late in the Spark JVM lifecycle. The most robust approach is crystallizing to be manipulation of the caller's classloader (mostly AppClassLoader) by calling a protected method using reflection.
We also try to accommodate loading of classes such as ShuffleManager that happens before spark.plugins are loaded by using a lazy proxy approach.
Describe the solution you'd like
JVM provides the java agent API that gives access to an instance implementing Instrumentation
This should allow the agent portion of the plugin to add the right parallel world Jar URL to the sytem classpath using public API without use of reflection after determining the Spark version.
This will also solve the chicken-and-egg Problem with the shuffle manager.
a) if we want we will be able to use a single class name for Rapids shuffle manager
b) we can add a boolean config for using RapidsShuffleManager, then the name of the class for Shuffle Manager does not really matter because it won't longer be exposed to the user.
We can also remove some of the boilerplate code that just delegates calls to wrapped objects by (generating it at load time)[https://www.baeldung.com/java-instrumentation]
Is your feature request related to a problem? Please describe.
Loading of shims in 21.10 from Parallel Worlds jar took several iterations to get right because we try to manipulate classloaders relatively late in the Spark JVM lifecycle. The most robust approach is crystallizing to be manipulation of the caller's classloader (mostly AppClassLoader) by calling a protected method using reflection.
We also try to accommodate loading of classes such as ShuffleManager that happens before
spark.plugins
are loaded by using a lazy proxy approach.Describe the solution you'd like
JVM provides the java agent API that gives access to an instance implementing Instrumentation
This should allow the agent portion of the plugin to add the right parallel world Jar URL to the sytem classpath using public API without use of reflection after determining the Spark version.
This will also solve the chicken-and-egg Problem with the shuffle manager.
a) if we want we will be able to use a single class name for Rapids shuffle manager
b) we can add a boolean config for using RapidsShuffleManager, then the name of the class for Shuffle Manager does not really matter because it won't longer be exposed to the user.
We can also remove some of the boilerplate code that just delegates calls to wrapped objects by (generating it at load time)[https://www.baeldung.com/java-instrumentation]
Describe alternatives you've considered
21.10 way
Additional context
The text was updated successfully, but these errors were encountered: