Improve UX for UNION
vs UNION ALL
(introduce a LogicalPlan::Distinct)
#2573
Labels
UNION
vs UNION ALL
(introduce a LogicalPlan::Distinct)
#2573
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The current support for
UNION
vsUNION ALL
is confusing to me, in the context of trying to map this to another query engine's support for these operators.Union
operator and this is currently assumed to representUNION ALL
.UNION
. It is possible to manually wrap a union in a distinct to achieve this but that might not be obvious to users. Also, the documentation fordistinct
is incorrect and says that it performs aunion
DataFusion Logical Plan
Spark Logical Plan
Spark Physical Plan
Describe the solution you'd like
I think what we want is:
Distinct
operator in the logical plandistinct
function create theDistinct
operator instead of the aggregate queryunion
(existing method representingUNION ALL
) andunion_distinct
Distinct
to an aggregate queryDescribe alternatives you've considered
Leave things as they are and improve the documentation.
Additional context
For users of DataFusion for SQL query planning, it would be easier to map union/union all to other engines rather than trying to reverse engineer the aggregate query wrapping the union.
The text was updated successfully, but these errors were encountered: