-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the incomplete capture of SubqueryBroadcast #4630
Fix the incomplete capture of SubqueryBroadcast #4630
Conversation
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
build |
I verified this fixes the query 5 crash, so this is definitely better than where we are now. However I do wonder about the case where Spark tries to reuse a GpuBroadcastExec when the subsequent AdaptiveSparkPlan will not be on the GPU and it's not because of DPP. I suspect Spark will inject a ColumnarToRow to try to smooth that transition, but will the plugin realize it needs to replace that with a GpuColumnarToRow transition to make that actually work? And in that case, it's a bit unfortunate we would transfer the shuffle to the host, send it to the device, then send it right back to the host for further processing. It would make more sense to convert the GPU-formatted columnar data on the host to row-based data on the host directly, but we could optimize that in the future. My primary concern is whether the scenario will work at all or whether we'll end up with a Spark ColumnarToRowExec trying to parse data from a GPU exchange. |
Hi @jlowe, I checked the Spark code of the main branch. So far, a BroadcastExchange will be introduced under two conditions:
I don't think we need to worry about the first condition, because BroadcastJoinMetas will fallback the build side once the join can not work on GPU. |
I'm worried about reuse in light of AQE. Are you saying it's impossible for a GpuBroadcastExchange to be reused as a BroadcastExchange in the plan? If so, doesn't that imply the plan to compute the broadcast ends up getting re-executed in that case (and thus somewhat relates to #3709)? |
Signed-off-by: sperlingxx lovedreamf@gmail.com
Fixes #4625
This PR is to fix the logic on searching/matching potential
SubqueryBroadcastExec
from partition filters of scans. In previous, we missed a possible condition: As a subquery,SubqueryBroadcastExec
may be reused. And this missing will lead to a crash when AQE is on.