-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commonize shim module join and shuffle files #3559
Conversation
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
build |
databricks build failed because of #3556 |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
<source>${project.basedir}/src/main/311until320-apache/scala</source> | ||
<source>${project.basedir}/src/main/301+-nondb/scala</source> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: here and elsewhere now that we touch this section often let us keep the lines sorted for easier reviews. In IDEA "Edit->Sort Lines" produces the order
<source>${project.basedir}/src/main/301+-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/301until320-nondb/scala</source>
<source>${project.basedir}/src/main/311+-all/scala</source>
<source>${project.basedir}/src/main/311+-nondb/scala</source>
<source>${project.basedir}/src/main/311until320-all/scala</source>
<source>${project.basedir}/src/main/311until320-apache/scala</source>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry missed this one, I'll fix it in the next pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also note I have older ide which doesn't have that option. I'll look at upgrading later, but I'm not sure I agree with its sorting going by the other ones, for instance:
<source>${project.basedir}/src/main/301until310-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-nondb/scala</source>
<source>${project.basedir}/src/main/301until320-all/scala</source>
<source>${project.basedir}/src/main/301until310-all/scala</source>
<source>${project.basedir}/src/main/301+-nondb/scala</source>
Why is it putting nondb before all, doesn't seem like by alphabetical? As long as we are consistent I'm fine with it but don't want to require certain version of IDE either though. Or perhaps those weren't sorted?
@@ -14,9 +14,10 @@ | |||
* limitations under the License. | |||
*/ | |||
|
|||
package com.nvidia.spark.rapids.shims.spark311 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that GpuBroadcastNestedLoopJoinExec
is no longer shimmed in any way, should this be moved into the same file as GpuBroadcastNestedLoopJoinExecBase
? Is there still a reason for the latter class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is definitely more things to move around and commonize, I was trying to keep it simple here and easy to review by just moving the files out of shim modules without a lot of code changes that could add bugs and make it harder to review. We can then follow up with more changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with followups as long as we're tracking somewhere the things we want to do so they don't get dropped.
@@ -356,7 +356,7 @@ abstract class GpuBroadcastExchangeExecBase( | |||
} | |||
} | |||
} | |||
GpuBroadcastExchangeExec.executionContext.submit[Broadcast[Any]](task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would like to see this file named GpuBroadcastExchangeExecBase
or something more accurate to what it contains.
@@ -230,7 +230,7 @@ abstract class GpuShuffleExchangeExecBase( | |||
} | |||
} | |||
|
|||
object GpuShuffleExchangeExec { | |||
object GpuShuffleExchangeExecBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Update filename to GpuShuffleExchangeExecBase.scala.
build |
very odd > 07:22:08 [ERROR] /home/ubuntu/spark-rapids/tests/src/test/scala/com/nvidia/spark/rapids/SerializationSuite.scala:20: object lang3 is not a member of package org.apache.commons builds fine locally and didn't change anything that would cause this |
that failed on databricks and I bet is from #3504 |
build |
Looks like the test was implicitly picking up commons lang3 from Spark and should have explicitly specified the dependency. I'll post a PR to fix. |
Many of the join and shuffle related files in the shim module were just there to build against the specific version of spark. Moves those files into the sql-plugin. Most of those had differences between 3.0.x and 3.1.x and then many times databricks had slight variations, so consolidated the shims module files into 3 or 4 in the sql-plugin directory.
Also note more to commonize later, doing it in pieces to hopefully keep easier to review.
I ran tests on databricks and some Apache but not all.