[FEA] Consider creating combined GpuCoalesceBatches and GpuShuffleExchange operator #719
Labels
feature request
New feature or request
P2
Not required for release
performance
A performance related task/issue
Is your feature request related to a problem? Please describe.
When AQE is enabled and we are planning a new query stage, we must return an operator that implements
ShuffleExchangeLike
(since Spark 3.0.1) so we remove anyGpuCoalesceBatches
operator and insert it later around theGpuCustomShuffleReader
that will read the shuffle output.I think it is worth exploring an alternate approach where instead of removing the
GpuCoalesceBatches
operator, we create a new operator that combinesGpuCoalesceBatches
andGpuShuffleExchangeExec
and returns that as the new query stage.The benefit of this approach if it works is that it makes the AQE and non-AQE plans more consistent and removes some complexity. It may also result in improved performance if it means that the shuffle reader is now reading coalesced batches, but I'm not 100% sure if I am understanding this correctly, so could do with a second opinion on this.
Describe the solution you'd like
See the previous section.
Describe alternatives you've considered
The alternative is the current design of coalescing after the shuffle reader.
Additional context
N/A
The text was updated successfully, but these errors were encountered: