Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary copies of ParquetCachedBatchSerializer #2948

Merged
merged 6 commits into from
Jul 20, 2021

Conversation

razajafri
Copy link
Collaborator

@razajafri razajafri commented Jul 16, 2021

We decided to remove the extra serializers and have a single version of it in the 311 shims. The package name will be changed to better suit whatever strategy we come up as a part of supporting Spark 3.2

Fixes #2905

Signed-off-by: Raza Jafri rjafri@nvidia.com

@razajafri razajafri marked this pull request as draft July 16, 2021 14:59
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri razajafri force-pushed the remove-serializers branch from b51bc2e to 993c61c Compare July 16, 2021 17:34
@razajafri razajafri marked this pull request as ready for review July 16, 2021 17:36
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a spark320 version was missed?

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

Looks like a spark320 version was missed?

Sorry, I left it out for the draft, but forgot to remove it post our meeting

@razajafri
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the documentation does need to be changed here. The cache docs refer to a table showing different classes for each Spark version, and those are being deleted in this PR.

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like jenkins/spark-tests.sh also needs to be updated since it tries to pick the shim-specific class.

@sameerz sameerz added this to the July 19 - July 30 milestone Jul 17, 2021
@sameerz sameerz added the task Work required that improves the product but is not user facing label Jul 17, 2021
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

jenkins/spark-tests.sh Outdated Show resolved Hide resolved
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

@razajafri razajafri merged commit 173a822 into NVIDIA:branch-21.08 Jul 20, 2021
gerashegalov pushed a commit to gerashegalov/spark-rapids that referenced this pull request Jul 25, 2021
* removed unnecessary copies of ParquetCachedBatchSerializer

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* optimized import

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* removed serializer for 320

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* updated docs

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* changed package

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* Update jenkins/spark-tests.sh

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

Co-authored-by: Raza Jafri <rjafri@nvidia.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Spark version specific Serializers unless they need to override behavior
3 participants