Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Spark's Utils.getContextOrSparkClassLoader to load Shims [databricks] #5646

Conversation

gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented May 26, 2022

Fixes #3851

Spark loads external datasources using Utils.getContextOrSparkClassLoader

Trampoline to Utils.getContextOrSparkClassLoader to make our current code work with external sources, and to unblock JDK9+

Signed-off-by: Gera Shegalov gera@apache.org

Signed-off-by: Gera Shegalov <gera@apache.org>
@gerashegalov gerashegalov changed the title Use Utils.getContextOrSparkClassLoader to load Shims [WIP] Use Utils.getContextOrSparkClassLoader to load Shims May 26, 2022
@gerashegalov gerashegalov changed the title [WIP] Use Utils.getContextOrSparkClassLoader to load Shims [WIP] Use Utils.getContextOrSparkClassLoader to load Shims [databricks] May 26, 2022
@gerashegalov
Copy link
Collaborator Author

build

@sameerz sameerz added the build Related to CI / CD or cleanly building label May 26, 2022
@gerashegalov gerashegalov marked this pull request as draft May 26, 2022 23:48
@gerashegalov
Copy link
Collaborator Author

current premerge status:

2 failed, 562 passed, 23 skipped, 4 xfailed in 619.26s (0:10:19) 

An error occurred while calling z:com.nvidia.spark.rapids.ExecutionPlanCaptureCallback.assertDidFallBack. : java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/shims/SparkShimImpl$ 

@gerashegalov gerashegalov added the task Work required that improves the product but is not user facing label May 27, 2022
@gerashegalov gerashegalov marked this pull request as ready for review May 27, 2022 18:38
@gerashegalov gerashegalov requested a review from zhanga5 as a code owner May 27, 2022 18:38
@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov
Copy link
Collaborator Author

Many more tests passed.
Next issues -k 'test_ts_read_fails_datetime_legacy[-reader_confs0-LEGACY-INT96-Timestamp]'

13:11:39  E                   Caused by: java.lang.UnsatisfiedLinkError: com.nvidia.spark.rapids.jni.ParquetFooter.readAndFilter(JJJJ[Ljava/lang/String;[IIZ)J
13:11:39  E                   	at com.nvidia.spark.rapids.jni.ParquetFooter.readAndFilter(Native Method)
13:11:39  E                   	at com.nvidia.spark.rapids.jni.ParquetFooter.readAndFilter(ParquetFooter.java:90)
13:11:39  E                   	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndFilterFooter$5(GpuParquetScan.scala:594)
13:11:39  E                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
13:11:39  E                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
13:11:39  E                   	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.withResource(GpuParquetScan.scala:460)
13:11:39  E                   	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndFilterFooter$4(GpuParquetScan.scala:586)

@gerashegalov gerashegalov marked this pull request as draft May 27, 2022 22:08
@gerashegalov gerashegalov changed the base branch from branch-22.06 to branch-22.08 May 28, 2022 06:59
Signed-off-by: Gera Shegalov <gera@apache.org>
@gerashegalov gerashegalov marked this pull request as ready for review May 28, 2022 07:05
@gerashegalov
Copy link
Collaborator Author

build

Signed-off-by: Gera Shegalov <gera@apache.org>
@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov gerashegalov changed the base branch from branch-22.10 to branch-22.08 August 5, 2022 00:04
@gerashegalov gerashegalov changed the base branch from branch-22.08 to branch-22.10 August 5, 2022 00:06
@gerashegalov
Copy link
Collaborator Author

retrigger workflows

@gerashegalov gerashegalov reopened this Aug 5, 2022
jlowe
jlowe previously approved these changes Aug 5, 2022
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loving this.

revans2
revans2 previously approved these changes Aug 8, 2022
@@ -1402,14 +1402,6 @@ object RapidsConf {
.booleanConf
.createWithDefault(true)

val FORCE_SHIMCALLER_CLASSLOADER = conf("spark.rapids.force.caller.classloader")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything else in spark2 use this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is merely due to spark2 cloning RapidsConf source. We verified there is no actual use of spark.rapids.force.caller.classloader in the spark 2 explain support.

@gerashegalov gerashegalov dismissed stale reviews from revans2 and jlowe via 908b780 August 16, 2022 00:21
@gerashegalov
Copy link
Collaborator Author

Upmerged and removed newer references to force.caller.classloader.

Ran iceberg tests locally without issues using the local and standalone modes.

@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov gerashegalov requested review from revans2 and jlowe August 16, 2022 01:01
@gerashegalov gerashegalov changed the title Use Utils.getContextOrSparkClassLoader to load Shims [databricks] Use Spark's Utils.getContextOrSparkClassLoader to load Shims [databricks] Aug 16, 2022
@gerashegalov gerashegalov merged commit 3b2986c into NVIDIA:branch-22.10 Aug 16, 2022
@gerashegalov gerashegalov deleted the gerashegalov/Utils.getContextOrSparkClassLoader branch August 16, 2022 22:33
@gerashegalov gerashegalov mentioned this pull request Nov 3, 2022
gerashegalov added a commit that referenced this pull request Jul 19, 2023
As of #5646  (22.10) specifying  `allowConventionalDistJar` is not necessary for the newer JDK's sake. The only legitimate use case is if the user does not want side-effects of dealing with a [multi-shim jar](https://github.com/NVIDIA/spark-rapids/blob/branch-23.08/CONTRIBUTING.md#building-a-distribution-for-a-single-spark-release) 

```bash
JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64 mvn clean install  -Dbuildver=341 -DskipTests 

JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64 TEST_PARALLEL=0 SPARK_HOME=~/dist/spark-3.4.1-bin-hadoop3 ./integration_tests/run_pyspark_from_build.sh -s -k array_exists
```

Signed-off-by: Gera Shegalov <gera@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] ShimLoader.updateSparkClassLoader fails with openjdk Java11
4 participants