Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6300] Function MAP_VALUES/MAP_KEYS gives exception when mapVauleType and mapKeyType not equals map Biggest mapKeytype or mapValueType #3847

Closed
wants to merge 0 commits into from

Conversation

caicancai
Copy link
Member

@caicancai caicancai commented Jul 6, 2024

@caicancai caicancai changed the title [CALCITE-6300] Function MAP_VALUES/MAP_KEYS gives exception when mapV…auleType and mapKeyType not equals map Biggest mapKeytype or mapValueType [CALCITE-6300] Function MAP_VALUES/MAP_KEYS gives exception when mapVauleType and mapKeyType not equals map Biggest mapKeytype or mapValueType Jul 6, 2024
@caicancai
Copy link
Member Author

@mihaibudiu Hello, I converted the callbinding after parameter verification, which only applies to the map function of spark.
I debugged the spark source code and found that the type has been converted before entering the logical plan.

What do you think

@@ -58,4 +58,10 @@ public static RelDataType getKeyTypeOrThrow(RelDataType type) {
return requireNonNull(type.getKeyType(),
() -> "keyType is null for " + type);
}

@API(since = "1.37", status = API.Status.EXPERIMENTAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this annotation necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -8182,6 +8182,19 @@ void checkRegexpExtract(SqlOperatorFixture f0, FunctionAlias functionAlias) {
"CHAR(3) NOT NULL ARRAY NOT NULL");
f1.checkScalar("map_keys(map('foo', 1, null, 2))", "[foo, null]",
"CHAR(3) ARRAY NOT NULL");

f1.checkScalar("map_keys(map('foo', 1, null, 2))", "[foo, null]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming these have been checked against spark.
Perhaps you can add a comment, also listing the spark version - Spark has been known to change function behaviors in the past.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I tested all versions after spark3.4


// If the key and value types are not unknown,
// adjust their types to suit the Map function's constructor.
if (mapKeyType.getSqlTypeName() != SqlTypeName.UNKNOWN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have discussed this in the past: I don't think that the type checker is supposed to modify the node that is being checked.

There are type checking strategies that will backtrack (e.g., SqlOperandTypeChecker.or), and they do not expect the checked data structure to change.

I think this pattern is very dangerous, and can lead to hard to diagnose bugs.

On the other hand, I don't know how this should be done properly. The map function does need to introduce type casts. But maybe the typechecking stage is not the right stage.

Copy link
Member Author

@caicancai caicancai Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I was sick these days.

Currently, this type check is specially prepared for Spark's map function, but there may be some problems that are difficult to eliminate as you said. However, Spark currently performs some type conversions during parameter checking. I have to admit that Spark does not do this very well.

I believe that after I have a deeper understanding of Calcite and the type system in the future, I can improve this part. I hope that day will not be too far away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Julian said in the Jira comments, you should insert casts during validation, before type checking.

Copy link
Member Author

@caicancai caicancai Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thank you.
done

Copy link

sonarcloud bot commented Jul 14, 2024

@@ -1532,6 +1532,7 @@ private static class MapKeyOperandTypeChecker extends SameOperandTypeChecker {
}
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new line seems an unnecessary change, please undo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, I will fix it and resolve the conflict tonight

@rubenada
Copy link
Contributor

@mihaibudiu do you have further comments here? IMO the proposed solution seems a bit ad-hoc, but seems to do the trick (and the new code is very well located). Shall we consider this for 1.38? (after resolving conflicts, squashing commits, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants