Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Exposing the GPU operator exec classes to user #3873

Closed
Niharikadutta opened this issue Oct 21, 2021 · 9 comments
Closed

[QST] Exposing the GPU operator exec classes to user #3873

Niharikadutta opened this issue Oct 21, 2021 · 9 comments
Labels
question Further information is requested

Comments

@Niharikadutta
Copy link

What is your question?
I'm trying to use the following classes for checking the RAPIDS overridden physical plan (for internal validations that Spark jobs are using RAPIDS acceleration as expected):

GpuProjectExec,GpuHashAggregateExec,GpuColumnarToRowExec,GpuRowToColumnarExec,GpuBroadcastHashJoinExec

And I can see these classes as a part of the v21.10 jar:
image

image

However on importing these classes I see the following error:

error: object GpuProjectExec is not a member of package com.nvidia.spark.rapids
       import com.nvidia.spark.rapids.{GpuProjectExec,GpuHashAggregateExec,GpuColumnarToRowExec,GpuRowToColumnarExec}

error: object shims is not a member of package com.nvidia.spark.rapids
       import com.nvidia.spark.rapids.shims.v2.GpuBroadcastHashJoinExec

Can we have these classes loadable outside of the catalyst by the user?

@Niharikadutta Niharikadutta added ? - Needs Triage Need team to review and classify question Further information is requested labels Oct 21, 2021
@tgravescs
Copy link
Collaborator

These are not public facing API's so the jar doesn't expose them without first doing some classloading stuff to load the files for the correct version of Spark. This is something to deal with allowing us to support multiple spark versions in a single jar.

Can you be more specific about what you are trying to do here, ie what setup, environment are you trying to do this from? What is your test trying to compare? ie isInstanceOf or can you just look at class name and what are you looking at - explain() output, sparkPlan ,etc?
The code above looks like perhaps you just did this in like a spark interactive shell, is this test something you are going to be doing in like a notebook?

@Niharikadutta
Copy link
Author

Niharikadutta commented Oct 25, 2021

I am doing this as a part of our regular system runs (in our production environment) to verify for GPU scenarios the expected operators are being overridden properly. I do this as a part of batch jobs, by loading the physical plan (queryExecution.executedPlan) and checking that the GPU operators override classes are present.
Do you have another way you can recommend to do this that would not require making these APIs public?

@tgravescs
Copy link
Collaborator

Do you have to use the actual SparkPlan and Execs instanceOf or can you just compare the strings? You can walk the plan and get the Class string or nodeName as a string and just see if it contains GpuProjectExec for instance.

@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Oct 26, 2021
@Niharikadutta
Copy link
Author

@tgravescs I can explore doing that, but for now could you walk me through the change needed to make these APIs public? We can do that in our internal repo of Spark RAPIDS so this change doesn't need to go in the OSS repo

@tgravescs
Copy link
Collaborator

sure, you can find some docs related here: https://github.com/NVIDIA/spark-rapids/tree/branch-21.12/dist

Essentially you need to add the classes into https://github.com/NVIDIA/spark-rapids/blob/branch-21.12/dist/unshimmed-common-from-spark301.txt file.

@gerashegalov
Copy link
Collaborator

gerashegalov commented Oct 27, 2021

I agree with @tgravescs and would also recommend just using nodeName / toString

Going the route of modifying unshimmed-common can quickly can turn into classloader-related problems for production jobs even in a single-shim jar.

I think if this functionality is really needed you need to create a custom single shim build, and we can make it easy with #3682, or we can look into providing / generating required checks. We could add some def ensureXYZ(plan: SparkPlan) to SparkShims (or a dedicated trait) and then calling it will be safe across classloader boundaries.

@Niharikadutta
Copy link
Author

@gerashegalov , @tgravescs Can we look into the below as suggested by Gera:

 We could add some def ensureXYZ(plan: SparkPlan) to SparkShims (or a dedicated trait) and then calling it will be safe across classloader boundaries.

While trying to add the operator exec classes, I'm seeing it depends on a lot of classes like com.nvidia.spark.rapids.GpuExec, com.nvidia.spark.rapids.Arm, etc. I'm not sure what is the complete list of all classes that need to be added to https://github.com/NVIDIA/spark-rapids/blob/branch-21.12/dist/unshimmed-common-from-spark301.txt file, to get this to work. Could you please help with this for the time being, till we have a more robust long-term solution for this?

@tgravescs
Copy link
Collaborator

If the dependencies are a lot there, then you might just end up with most of the classes being in there, which seems like a lot of work.
So how many API's are you using? I think there are a few options here.

  1. if not very many, seems like just changing your test code to use Strings or something would be less work.
  2. Is this run as a notebook or are you pre-compiling the tests? If pre-compiling you might be able to build that against the aggregator jar but then run against the official dist jar in your image because the classloader would have loaded the classes you need by then.

@Niharikadutta
Copy link
Author

Got it, thanks @tgravescs we will proceed with using Node strings to validate, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants