Skip to content

Commit

Permalink
Implement per-shim parallel world jar classloader (#3381)
Browse files Browse the repository at this point in the history
Signed-off-by: Gera Shegalov <gera@apache.org>

Contributes to #3232. Use MutableURLClassLoader in conjunction with JarURLConnection JAR URLs to create "parallel worlds" for each shim in a single jar file.

Assumes a package layout consisting of three types of areas

- a few publicly documented classes in the conventional layout
- a large fraction of classes whose bytecode is identical under all supported Spark versions
- a smaller fraction of classes that differ under one of the supported Spark versions, aka "parallel worlds" in the JDK's com.sun.istack.internal.tools.ParallelWorldClassLoader terminology

```
$ jar tvf rapids-4-spark_2.12.jar
com/nvidia/spark/SQLPlugin.class
spark3xx-common/com/nvidia/spark/rapids/CastExprMeta.class
spark301/org/apache/spark/sql/rapids/GpuUnaryMinus.class    
spark311/org/apache/spark/sql/rapids/GpuUnaryMinus.class
spark320/org/apache/spark/sql/rapids/GpuUnaryMinus.class
```
  • Loading branch information
gerashegalov authored Sep 10, 2021
1 parent 9e533ca commit 1172350
Show file tree
Hide file tree
Showing 67 changed files with 1,073 additions and 252 deletions.
21 changes: 21 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,27 @@ You can build against different versions of the CUDA Toolkit by using one of the

## Code contributions

### Source code layout

Conventional code locations in Maven modules are found under `src/main/<language>`. In addition to
that and in order to support multiple versions of Apache Spark with the minimum amount of source
code we maintain Spark-version-specific locations within non-shim modules if necessary. This allows
us to switch between incompatible parent classes inside without copying the shared code to
dedicated shim modules.

Thus, the conventional source code root directories `src/main/<language>` contain the files that
are source-compatible with all supported Spark releases, both upstream and vendor-specific.

The version-specific directory names have one of the following forms / use cases:
- `src/main/312/scala` contains Scala source code for a single Spark version, 3.1.2 in this case
- `src/main/312+-apache/scala`contains Scala source code for *upstream* **Apache** Spark builds,
only beginning with version Spark 3.1.2, and + signifies there is no upper version boundary
among the supported versions
- `src/main/302until312-all` contains code that applies to all shims between 3.0.2 *inclusive*,
3.1.2 *exclusive*
- `src/main/302to312-cdh` contains code that applies to Cloudera CDH shims between 3.0.2 *inclusive*,
3.1.2 *inclusive*

### Your first issue

1. Read the [Developer Overview](docs/dev/README.md) to understand how the RAPIDS Accelerator
Expand Down
54 changes: 25 additions & 29 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@
<module>udf-compiler</module>
<module>udf-examples</module>
</modules>

<profiles>
<profile>
<id>default</id>
Expand All @@ -91,8 +90,8 @@
<module>api_validation</module>
<module>tools</module>
</modules>
</profile>
<profile>
</profile>
<profile>
<id>no-buildver-default</id>
<activation>
<property>
Expand All @@ -114,18 +113,16 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/spark30+all/java</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>

<profile>
</profile>
<profile>
<id>buildver-default</id>
<activation>
<property>
Expand All @@ -144,17 +141,16 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark${buildver}/scala</source>
<source>${project.basedir}/src/main/spark${buildver}/java</source>
<source>${project.basedir}/src/main/${buildver}/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
<profile>
</profile>
<profile>
<id>release301</id>
<activation>
<property>
Expand All @@ -178,7 +174,7 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -214,7 +210,7 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -243,7 +239,7 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -283,7 +279,7 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -319,7 +315,7 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -355,9 +351,9 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/spark31+all/scala</source>
<source>${project.basedir}/src/main/spark31+apache/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-apache/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -448,9 +444,9 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/spark31+all/scala</source>
<source>${project.basedir}/src/main/spark31+apache/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-apache/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -486,9 +482,9 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/spark31+all/scala</source>
<source>${project.basedir}/src/main/spark31+apache/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-apache/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -524,7 +520,7 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark31+all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
</sources>
</configuration>
</execution>
Expand Down Expand Up @@ -557,8 +553,8 @@
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${project.basedir}/src/main/spark30+all/scala</source>
<source>${project.basedir}/src/main/spark31+all/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
</sources>
</configuration>
</execution>
Expand Down
37 changes: 37 additions & 0 deletions shims/spark301/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,41 @@
<scope>provided</scope>
</dependency>
</dependencies>
<profiles>
<profile>
<!--
enables Shim class consolidation without breaking the existing build,
to be removed when the build architecture swap is complete
-->
<id>no-buildver-default</id>
<activation>
<property>
<name>!buildver</name>
</property>
</activation>
<properties>
<spark-rapids.sql-plugin.root>${project.basedir}/../../sql-plugin</spark-rapids.sql-plugin.root>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<executions>
<execution>
<id>add-profile-src-default</id>
<goals><goal>add-source</goal></goals>
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${spark-rapids.sql-plugin.root}/src/main/301until320-all/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -18,7 +18,7 @@ package org.apache.spark.sql.rapids.shims.spark301

import org.apache.spark.{SparkConf, TaskContext}
import org.apache.spark.shuffle._
import org.apache.spark.sql.rapids.RapidsShuffleInternalManagerBase
import org.apache.spark.sql.rapids.{ProxyRapidsShuffleInternalManagerBase, RapidsShuffleInternalManagerBase}

/**
* A shuffle manager optimized for the RAPIDS Plugin For Apache Spark.
Expand Down Expand Up @@ -50,3 +50,30 @@ class RapidsShuffleInternalManager(conf: SparkConf, isDriver: Boolean)
}

}

class ProxyRapidsShuffleInternalManager(conf: SparkConf, isDriver: Boolean)
extends ProxyRapidsShuffleInternalManagerBase(conf, isDriver) {

override def getReader[K, C](
handle: ShuffleHandle,
startPartition: Int,
endPartition: Int,
context: TaskContext,
metrics: ShuffleReadMetricsReporter
): org.apache.spark.shuffle.ShuffleReader[K,C] = {
self.getReader(handle, startPartition, endPartition, context, metrics)
}

override def getReaderForRange[K, C](
handle: ShuffleHandle,
startMapIndex: Int,
endMapIndex: Int,
startPartition: Int,
endPartition: Int,
context: TaskContext,
metrics: ShuffleReadMetricsReporter
): ShuffleReader[K,C] = {
self.getReaderForRange(handle, startMapIndex, endMapIndex, startPartition, endPartition,
context, metrics)
}
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -17,10 +17,10 @@
package com.nvidia.spark.rapids.spark301

import org.apache.spark.SparkConf
import org.apache.spark.sql.rapids.shims.spark301.RapidsShuffleInternalManager
import org.apache.spark.sql.rapids.shims.spark301.ProxyRapidsShuffleInternalManager

/** A shuffle manager optimized for the RAPIDS Plugin for Apache Spark. */
sealed class RapidsShuffleManager(
conf: SparkConf,
isDriver: Boolean) extends RapidsShuffleInternalManager(conf, isDriver) {
isDriver: Boolean) extends ProxyRapidsShuffleInternalManager(conf, isDriver) {
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@
package com.nvidia.spark.rapids.spark301db

import org.apache.spark.SparkConf
import org.apache.spark.sql.rapids.shims.spark301db.RapidsShuffleInternalManager
import org.apache.spark.rapids.shims.v2.ProxyRapidsShuffleInternalManager

/** A shuffle manager optimized for the RAPIDS Plugin for Apache Spark. */
sealed class RapidsShuffleManager(
conf: SparkConf,
isDriver: Boolean) extends RapidsShuffleInternalManager(conf, isDriver) {
isDriver: Boolean) extends ProxyRapidsShuffleInternalManager(conf, isDriver) {
}
38 changes: 38 additions & 0 deletions shims/spark301emr/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,42 @@
<scope>provided</scope>
</dependency>
</dependencies>

<profiles>
<profile>
<!--
enables Shim class consolidation without breaking the existing build,
to be removed when the build architecture swap is complete
-->
<id>no-buildver-default</id>
<activation>
<property>
<name>!buildver</name>
</property>
</activation>
<properties>
<spark-rapids.sql-plugin.root>${project.basedir}/../../sql-plugin</spark-rapids.sql-plugin.root>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<executions>
<execution>
<id>add-profile-src-default</id>
<goals><goal>add-source</goal></goals>
<phase>generate-sources</phase>
<configuration>
<sources>
<source>${spark-rapids.sql-plugin.root}/src/main/301until320-all/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
Loading

0 comments on commit 1172350

Please sign in to comment.