-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce OrdinalReturnTypeInferenceV2 to infer RexCall's return type #481
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aastha25 for this PR!
Could you kick off a regression test for it for safety?
And have we tested on the target view? Could you add the test results in the testing done?
...ive/src/main/java/com/linkedin/coral/hive/hive2rel/functions/StaticHiveFunctionRegistry.java
Show resolved
Hide resolved
...ive/src/main/java/com/linkedin/coral/hive/hive2rel/functions/StaticHiveFunctionRegistry.java
Show resolved
Hide resolved
|
@aastha25 Thanks for confirmation. |
unfortunately the view will be created & tested by the user next week, after Coral & Spark enable support for it. In case of any issues, we'll need to work on a potential follow up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
return rexCall; | ||
} | ||
} | ||
|
||
RelDataType fieldType = rexCall.getType(); | ||
boolean isNullable = SchemaUtilities.isFieldNullable(rexCall, inputSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the reason OrdinalReturnTypeInference
does not work out of the box?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[OrdinalReturnTypeInference](https://github.com/linkedin/linkedin-calcite/blob/li-1.21.0/core/src/main/java/org/apache/calcite/sql/type/OrdinalReturnTypeInference.java#L25)
cannot be directly used because the private field 'ordinal' has no public accessor method. This class supports type derivation via method RelDataType inferReturnType(SqlOperatorBinding opBinding)
. To leverage this API, we need to make coral-schema complaint with Coral IR and enable type derivation in coral-schema, similar to our ongoing work in coral-spark & coral-trino
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So once we append a RexCall field, what gets used at the end to infer its type if not inferReturnType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we append the field AFTER extracting the ordinal here and deriving its field type here
Since this is a rexShuttle, the field types from the inputSchema are available in the RelNode representation.
SchemaUtilities.isFieldNullable(rexCall, inputSchema)
derives incorrect nullability (always nullable) for the top level fields and SchemaUtilities.makeNullable
here derives incorrect nullability (always nullable) for the inner fields.
By introducing the if
condition above in line 445, we avoid those codepaths
What changes are proposed in this pull request, and why are they necessary?
Given a SQL definition -
SELECT udf(StructCol) AS modCol FROM foo
Prior to this PR, Coral-Schema generates a nullable schema for
modCol
with top level and inner fields set to nullable even whenStructCol
is non-nullable. Sinceudf
's return type nullability inference is unavailable toRelToAvroSchemaConverter
, this shuttle assumes the schema after applying th UDF is nullable.This PR introduces a new
SqlReturnTypeInference
-OrdinalReturnTypeInferenceV2
which has been leveraged inRelToAvroSchemaConverter
to derive udf -cast_nullability
's return type schema (and its nullability info) more accurately. This inference strategy has been applied when the SqlCall's return type is same as one of the operands type AND that operand refers to a field from the input schema.How was this patch tested?
./gradlew build
added new UTs.