Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement per-shim parallel world jar classloader #3381

Merged
merged 21 commits into from
Sep 10, 2021

Conversation

gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented Sep 3, 2021

Signed-off-by: Gera Shegalov gera@apache.org

Contributes to #3232. Use MutableURLClassLoader in conjunction with JarURLConnection JAR URLs to create "parallel worlds" for each shim in a single jar file.

Assumes a package layout consisting of three types of areas

  • a few publicly documented classes in the conventional layout
  • a large fraction of classes whose bytecode is identical under all supported Spark versions
  • a smaller fraction of classes that differ under one of the supported Spark versions, aka "parallel worlds" in the JDK's com.sun.istack.internal.tools.ParallelWorldClassLoader terminology
$ jar tvf rapids-4-spark_2.12.jar
com/nvidia/spark/SQLPlugin.class
spark3xx-common/com/nvidia/spark/rapids/CastExprMeta.class
spark301/org/apache/spark/sql/rapids/GpuUnaryMinus.class    
spark311/org/apache/spark/sql/rapids/GpuUnaryMinus.class
spark320/org/apache/spark/sql/rapids/GpuUnaryMinus.class

Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
@gerashegalov gerashegalov self-assigned this Sep 3, 2021
@gerashegalov gerashegalov added this to the Aug 30 - Sept 10 milestone Sep 3, 2021
@gerashegalov
Copy link
Collaborator Author

build

@sameerz sameerz added the task Work required that improves the product but is not user facing label Sep 3, 2021
@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov
Copy link
Collaborator Author

build

sql-plugin/src/main/scala/com/nvidia/spark/SQLPlugin.scala Outdated Show resolved Hide resolved
val sparkConf = pluginContext.conf
RapidsPluginUtils.fixupConfigs(sparkConf)
val conf = new RapidsConf(sparkConf)
if (conf.shimsProviderOverride.isDefined) {
if (conf.shimsProviderOverride.isDefined) { // TODO test it, probably not working yet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this yet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually hard to test before I can pull @tgravescs's change

@tgravescs
Copy link
Collaborator

I'm assuming you haven't tested on Databricks unless I'm out of date, I pulled this branch into my branch with dist packaging changes and it fails. I have fixed it on my branch if we want to wait but we should probably put this and that in close together.

@gerashegalov
Copy link
Collaborator Author

build

pom.xml Show resolved Hide resolved
@gerashegalov
Copy link
Collaborator Author

build

tgravescs
tgravescs previously approved these changes Sep 9, 2021
@gerashegalov
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 3.2+ task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] create the plugin package capable of storing conflicting multiple versions of same named classes
5 participants