Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] fix spark-rapids compatibility issue #8240

Merged
merged 4 commits into from
Sep 22, 2022

Conversation

wbo4958
Copy link
Contributor

@wbo4958 wbo4958 commented Sep 9, 2022

spark-rapids (from 22.10) has shimmed GpuColumnVector, which means we can't call it directly. So this PR call the UnshimmedGpuColumnVector

spark-rapids (from 22.10) has shimmed GpuColumnVector, which means
we can't call it directly. So this PR call the UnshimmedGpuColumnVector
} catch {
case _: ClassNotFoundException =>
// If it's older version, use the GpuColumnVector
GpuColumnVector.extractColumns(table, types).map(_.copyToHost())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it copying the data to host?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is. I just copied the code from another location to there. Definitely, we can improve this, by leaving the data in GPU and changing the incoming UDFs to GPU ways in the following PR. But this PR does not address these issues.

@wbo4958 wbo4958 changed the title [WIP][jvm-packages] fix spark-rapids compatibility issue [jvm-packages] fix spark-rapids compatibility issue Sep 20, 2022
@wbo4958
Copy link
Contributor Author

wbo4958 commented Sep 20, 2022

The PR NVIDIA/spark-rapids#6534 to expose the API accessing GpuColumnVector is merged in spark-rapids. Now, this PR is ready for review.

I have tested this PR + RAPIDS 22.08 / 22.10-SNAPSHOT locally (with and without Jupiter notebook + spylon), and both of them can work well.

@wbo4958 wbo4958 marked this pull request as ready for review September 20, 2022 12:39
@wbo4958
Copy link
Contributor Author

wbo4958 commented Sep 22, 2022

@trivialfis this pr is ready to merge.

def extractBatchToHost(table: Table, types: Array[DataType]): Array[ColumnVector] = {
// spark-rapids has shimmed the GpuColumnVector from 22.10
try {
val clazz = Utils.classForName("com.nvidia.spark.rapids.GpuColumnVectorUtils")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plan to address this in spark-rapids? We can merge a workaround like this for now, but it would be great if there are some alternatives being planned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, The PR I recently merged can fix it in spark-rapids, but still need XGBoost to follow that way. So hopefully, we can bump the rapids dependency to 22.10 to fix it after 22.10 is released.

@trivialfis trivialfis merged commit 8d247f0 into dmlc:master Sep 22, 2022
@wbo4958 wbo4958 deleted the spark-rapids-compatibility branch April 23, 2024 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants