-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify expressions swallows a cast expression #13481
Comments
To reiterate, the real check for a fix here should be passing test like this #[test]
fn test_simplify_cast_list() {
use datafusion_functions_nested::expr_fn::make_array;
let element_field = Arc::new(Field::new("element", DataType::Int32, true));
let items_field =
Field::new("items", DataType::List(element_field.clone()), true);
let schema = Schema::new(vec![items_field.clone()])
.to_dfschema()
.unwrap();
let input = cast(
make_array(vec![lit(2), lit(3)]),
DataType::List(element_field.clone()),
);
let output = simplify(input.clone());
assert_eq!(
input.data_type_and_nullable(&schema).unwrap(),
output.data_type_and_nullable(&schema).unwrap(),
);
} In other words we'd want the type (and null-ability?) to remain the same after simplification. |
Ok I think that whereas #13468 restores the data type equivalence after simplification (and thus unblocks the delta-rs issue) it does not help with nullabilty matching as well. Arguably nullability equivalence should not be a blocker, and could be tracked separately I think (if at all). |
Indeed digging a bit more suggests that non-matching nullability is sort of ok, as it can be explained away like this
Incidentally the first one matches the actual nullability in the schema, whereas the second doesn't/ |
Sorry for the confusion, the tldr is that :
So I think we should merge it once everything's been addressed. |
Describe the bug
It seems
SimplifyExpressions
will sometimes omit a cast which was introduced by theTypeCoercion
analyzer rule, leading to a loss in the precision of the output DataType.To Reproduce
Run
The cast clause gets stripped since the constant expression gets evaluated, so the first assert (expectedly) fails with
However the second assert will also (unexpectedly) fail with
Meaning that the custom list item field name is lost now.
Effects
I believe this in turn causes delta-io/delta-rs#2886.
Basically
TypeCoercion
injects the CAST, whileSimplifyExpressions
omits it without replacing the precise data type. ConsequentlyOptimizeProjections
folds the two projections, but the schema is computed from the outermost projection expressions, which now lack the proper type info (i.e. the non-standard list item field names).Minimal repro:
Fails with
Error: Context("Optimizer rule 'optimize_projections' failed", Context("optimize_projections", Internal("Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: \"items\", data_type: List(Field { name: \"element\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: \"items\", data_type: List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None], functional_dependencies: FunctionalDependencies { deps: [] } }")))
Expected behavior
SimplifyExpressions
should retain exact output type, and there should be no optimizer errors in the second (high-level) example.Additional context
No response
The text was updated successfully, but these errors were encountered: