Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition to v2 shims [Databricks] #4857

Merged
merged 35 commits into from
Mar 8, 2022
Merged

Conversation

razajafri
Copy link
Collaborator

@razajafri razajafri commented Feb 24, 2022

This PR in a nutshell

  • Moved all the SparkShims to a standard package com.nvidia.spark.rapids.shims.v2 so we don't have to dynamically look them up at runtime
  • Combined duplicate classes to reduce code redundancy
  • Removed the need to call the ShimLoader when accessing the Shims

fixes #4474

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
… ShimLoader

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@tgravescs
Copy link
Collaborator

so one thing to be aware of here is that any of our partners that have shim layers are going to have to transition as well. which means more work on their side so lets try to get this done how we want in one release.

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri razajafri changed the title Transition to v2 shims Transition to v2 shims [Databricks] Feb 26, 2022
@razajafri
Copy link
Collaborator Author

build

@sameerz sameerz added this to the Feb 28 - Mar 18 milestone Feb 26, 2022
@sameerz sameerz added the task Work required that improves the product but is not user facing label Feb 26, 2022
| Spark Shim | spark.shuffle.manager value |
| --------------| -------------------------------------------------------- |
| 3.0.1 | com.nvidia.spark.rapids.spark301.RapidsShuffleManager |
| 3.0.2 | com.nvidia.spark.rapids.spark302.RapidsShuffleManager |
Copy link
Collaborator

@gerashegalov gerashegalov Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we end up solving incompatible ShuffleManager base across Apache Spark versions. I thought having a single facade class across shims required dynamic bytecode generation at Runtime depending on the shim because compile-time method signatures cannot be consistent for each Spark version?

I believe one of the issues in 30x vs 32x

where there is a reference to ShuffleBlockResolver https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala#L92

where in 31x we have a transitive reference to MergedBlockMeta https://github.com/apache/spark/blob/branch-3.2/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockResolver.scala#L56

However there is no MergedBlockMeta in branch-3.0. And we should get a class verification exception

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested everything after 3.1.1+ and it worked. I have reverted the change as it wasn't my original intention to remove the duplicate ShuffleManagers. The original intention was to get rid of the dynamic lookup of the shims at runtime.

@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Will be fixed as a part of a separate patch

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@sameerz
Copy link
Collaborator

sameerz commented Mar 4, 2022

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@tgravescs
Copy link
Collaborator

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

@razajafri razajafri merged commit 9f727bd into NVIDIA:branch-22.04 Mar 8, 2022
@razajafri razajafri deleted the shim-work branch March 8, 2022 19:55
@gerashegalov
Copy link
Collaborator

@razajafri there are more references to non-existent shims dir that will need updating

git grep '[^/]shims/'
CONTRIBUTING.md:There is a known issue with the shims/spark3xx submodules. After being enabled once, a module such as shims/spark312
docs/dev/shims.md:(i.e., in `sql-plugin/src/main/scala` as opposed to be duplicated either in `shims/spark3xx` submodules or over
jenkins/Jenkinsfile-blossom.premerge:                                                                      'sql-plugin/src/main/scala/,shims/spark311/src/main/scala/,' +
jenkins/Jenkinsfile-blossom.premerge:                                                                      'shims/spark301db/src/main/scala/,shims/spark301/src/main/scala/,' +
jenkins/Jenkinsfile-blossom.premerge:                                                                      'shims/spark302/src/main/scala/,shims/spark303/src/main/scala/,' +
jenkins/Jenkinsfile-blossom.premerge:                                                                      'shims/spark304/src/main/scala/,shims/spark312/src/main/scala/,' +
jenkins/Jenkinsfile-blossom.premerge:                                                                      'shims/spark313/src/main/scala/'

@pxLi
Copy link
Collaborator

pxLi commented Mar 9, 2022

we do exactly match for post-fix trigger [databricks] instead of [Databricks] for db build as https://github.com/NVIDIA/spark-rapids/blob/branch-22.04/CONTRIBUTING.md#blossom-ci to avoid accidental triggering

so this one was actually passed CI w/o db tests, created ticket here #4918

FYI also I decide to get the check case-insensitive to avoid cases like this #4919

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transition away from v1 shims
5 participants