-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] fix spark-rapids compatibility issue #8240
Conversation
spark-rapids (from 22.10) has shimmed GpuColumnVector, which means we can't call it directly. So this PR call the UnshimmedGpuColumnVector
} catch { | ||
case _: ClassNotFoundException => | ||
// If it's older version, use the GpuColumnVector | ||
GpuColumnVector.extractColumns(table, types).map(_.copyToHost()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it copying the data to host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is. I just copied the code from another location to there. Definitely, we can improve this, by leaving the data in GPU and changing the incoming UDFs to GPU ways in the following PR. But this PR does not address these issues.
The PR NVIDIA/spark-rapids#6534 to expose the API accessing GpuColumnVector is merged in spark-rapids. Now, this PR is ready for review. I have tested this PR + RAPIDS 22.08 / 22.10-SNAPSHOT locally (with and without |
@trivialfis this pr is ready to merge. |
def extractBatchToHost(table: Table, types: Array[DataType]): Array[ColumnVector] = { | ||
// spark-rapids has shimmed the GpuColumnVector from 22.10 | ||
try { | ||
val clazz = Utils.classForName("com.nvidia.spark.rapids.GpuColumnVectorUtils") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plan to address this in spark-rapids? We can merge a workaround like this for now, but it would be great if there are some alternatives being planned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, The PR I recently merged can fix it in spark-rapids, but still need XGBoost to follow that way. So hopefully, we can bump the rapids dependency to 22.10 to fix it after 22.10 is released.
spark-rapids (from 22.10) has shimmed GpuColumnVector, which means we can't call it directly. So this PR call the UnshimmedGpuColumnVector