-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior in HashJoin Projections #10978
Comments
You can see my temporary possible fix here: hstack@a4ab67d |
It seems a bug related to me, thanks for catching it. I would take a look later. |
@adragomir could you provide any example of failure / inconsistency via physical plan construction or SQL statement? From what I see,
|
Describe the bug
We ran into problems with projections inside HashJoin.
Each schema in the join (left / right) has:
The projection is
[0, 2]
- the struct column from left, and the struct column from rightThe join column is not specified in the output. When trying to optimize the join and reverse the order, the projection is swapped as
[2, 0]
, however there is no column with index 2 in the output, as the output contains only the 2 structsTo Reproduce
(key, value)
key
value
fieldsExpected behavior
The hash join optimization works, even when swapping the join order (and wrapping in a ProjectionExec)
Additional context
Reading the comment for HashJoinExec::projection it says
The projection indices of the columns in the output schema of join
, howevertry_new
it seems to be checked against the join schemawith_projection
it seems to be checked against the output schemaswap_join_projection
function - as it uses the left and right schemasI tried taking a stab at it, but it's unclear what the meaning of what is passed in projections is.
For now, I am fixing it surgically when swapping the order - I am rewriting the projections to be relative to the output schema when wrapping the join with a
ProjectionExec
The text was updated successfully, but these errors were encountered: