Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cloudera shim layer #2423

Merged
merged 16 commits into from
Jun 2, 2021

Conversation

sririshindra
Copy link
Contributor

Adding the Cloudera shim layer to support the spark-rapids plugin for Cloudera's version of spark.

@tgravescs
Copy link
Collaborator

@sririshindra thanks for the PR, you need to sign off on a commit, you can see how to do that here: https://github.com/NVIDIA/spark-rapids/blob/branch-0.6/CONTRIBUTING.md#sign-your-work

@sameerz sameerz added the feature request New feature or request label May 17, 2021
@sririshindra sririshindra force-pushed the branch-0.6-cloudera branch from 29aae57 to 06010f0 Compare May 17, 2021 18:07
rishi and others added 2 commits May 17, 2021 16:04
Signed-off-by: Rishi <sririshindra@gmail.com>
Change-Id: I4fada0dc6c083edef340c56f3f00304b2264537f
@sririshindra sririshindra force-pushed the branch-0.6-cloudera branch from 06010f0 to fa1f30b Compare May 17, 2021 22:49
@sririshindra sririshindra changed the title [WIP] Add cloudera shim layer Add cloudera shim layer May 18, 2021
@pxLi pxLi changed the base branch from branch-0.6 to branch-21.06 May 19, 2021 01:08
@sameerz sameerz linked an issue May 19, 2021 that may be closed by this pull request
@sameerz
Copy link
Collaborator

sameerz commented May 20, 2021

build

@tgravescs
Copy link
Collaborator

tests are failing due to:
01:42:58 �[31m Cause: java.lang.IllegalArgumentException: Multiple Spark Shim Loaders found: List(com.nvidia.spark.rapids.shims.spark311.SparkShimServiceProvider@63846fa4, com.nvidia.spark.rapids.shims.spark311cdh.SparkShimServiceProvider@206b959c)�[0m

My guess is version here is matching on normal Apache Spark

Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note, if you haven't noticed we renamed our branch to be branch-21.06 so you might want to upmerge to that if you havne't.

@sririshindra sririshindra force-pushed the branch-0.6-cloudera branch from c2a6c71 to 827c5bc Compare May 25, 2021 01:06
@tgravescs
Copy link
Collaborator

build

@tgravescs
Copy link
Collaborator

My main question here, is if we really need all these files or we can pick up the Spark311Shim versions of it

@tgravescs
Copy link
Collaborator

build

@tgravescs
Copy link
Collaborator

build

@tgravescs
Copy link
Collaborator

build

api_validation/pom.xml Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
@tgravescs
Copy link
Collaborator

the test seems to be detecting multiple shims for some reason, I'll have to investigate more:

[32mHashAggregatesSuite:�[0m
16:47:00  �[31m*** RUN ABORTED ***�[0m
16:47:00  �[31m  java.lang.ExceptionInInitializerError:�[0m
16:47:00  �[31m  at com.nvidia.spark.rapids.RapidsDriverPlugin.init(Plugin.scala:161)�[0m
16:47:00  �[31m  at org.apache.spark.internal.plugin.DriverPluginContainer.$anonfun$driverPlugins$1(PluginContainer.scala:53)�[0m
16:47:00  �[31m  at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)�[0m
16:47:00  �[31m  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)�[0m
16:47:00  �[31m  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)�[0m
16:47:00  �[31m  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)�[0m
16:47:00  �[31m  at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)�[0m
16:47:00  �[31m  at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)�[0m
16:47:00  �[31m  at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)�[0m
16:47:00  �[31m  at org.apache.spark.internal.plugin.DriverPluginContainer.<init>(PluginContainer.scala:46)�[0m
16:47:00  �[31m  ...�[0m
16:47:00  �[31m  Cause: java.lang.IllegalArgumentException: Multiple Spark Shim Loaders found: List(com.nvidia.spark.rapids.shims.spark311.SparkShimServiceProvider@248b2091, com.nvidia.spark.rapids.shims.spark311cdh.SparkShimServiceProvider@3b0c38f2)�[0m
16:47:00  �[31m  at com.nvidia.spark.rapids.ShimLoader$.detectShimProvider(ShimLoader.scala:39)�[0m
16:47:00  �[31m  at com.nvidia.spark.rapids.ShimLoader$.findShimProvider(ShimLoader.scala:52)�[0m
16:47:00  �[31m  at com.nvidia.spark.rapids.ShimLoader$.getSparkShims(ShimLoader.scala:64)�[0m
16:47:00  �[31m  at org.apache.spark.sql.rapids.GpuShuffleEnv$.<init>(GpuShuffleEnv.scala:66)�[0m
16:47:00  �[31m  at org.apache.spark.sql.rapids.GpuShuffleEnv$.<clinit>(GpuShuffleEnv.scala)�[0m
16:47:00  �[31m  at com.nvidia.spark.rapids.RapidsDriverPlugin.init(Plugin.scala:161)�[0m
16:47:00  �[31m  at org.apache.spark.internal.plugin.DriverPluginContainer.$anonfun$driverPlugins$1(PluginContainer.scala:53)�[0m
16:47:00  �[31m  at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)�[0m
16:47:00  �[31m  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)�[0m
16:47:00  �[31m  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)�[0m
16:47:00  �[31m  ...�[0m

@tgravescs
Copy link
Collaborator

ok, please update the pom files and then it looks good.

@tgravescs
Copy link
Collaborator

@sririshindra can you update the pom files per my comments and upmerge to the latest.

rishi added 2 commits June 1, 2021 10:42
…testing

# Conflicts:
#	pom.xml
#	shims/aggregator/pom.xml
… to under the cdh shim module's pom.xl file
@tgravescs
Copy link
Collaborator

build

@tgravescs tgravescs merged commit 72cdec4 into NVIDIA:branch-21.06 Jun 2, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add cloudera shim layer

* change Shim layer's name to spark311cdh from spark311cloud

Signed-off-by: Rishi <sririshindra@gmail.com>

* spark-rapids maven ant build fix.

Change-Id: I4fada0dc6c083edef340c56f3f00304b2264537f

* Fix Indentation

* Revert "spark-rapids maven ant build fix."

This reverts commit fa1f30b

* Address Tom's Review comments

* removing unnecessary methods reduced code duplication.

Also addressed other review comments.

* mend

* Checking version numbers for Cloudera Shim.

* Keep only the exec that uses by the ParquetCachedBatchSerializer

* Fixed the scalastyle violation.

* Fixed the version number conflict issue.

* Removing changes from api_validation/pom.xml, moved the cloudera repo to under the cdh shim module's pom.xl file

Co-authored-by: rishi <spothireddi@cloudera.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add cloudera shim layer

* change Shim layer's name to spark311cdh from spark311cloud

Signed-off-by: Rishi <sririshindra@gmail.com>

* spark-rapids maven ant build fix.

Change-Id: I4fada0dc6c083edef340c56f3f00304b2264537f

* Fix Indentation

* Revert "spark-rapids maven ant build fix."

This reverts commit fa1f30b

* Address Tom's Review comments

* removing unnecessary methods reduced code duplication.

Also addressed other review comments.

* mend

* Checking version numbers for Cloudera Shim.

* Keep only the exec that uses by the ParquetCachedBatchSerializer

* Fixed the scalastyle violation.

* Fixed the version number conflict issue.

* Removing changes from api_validation/pom.xml, moved the cloudera repo to under the cdh shim module's pom.xl file

Co-authored-by: rishi <spothireddi@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Create Cloudera shim layer
3 participants