Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support new IntervalCompound and IntervalDay update #288

Merged
merged 8 commits into from
Sep 20, 2024

Conversation

Blizzara
Copy link
Contributor

@Blizzara Blizzara commented Aug 21, 2024

Bumps substrait to latest to get substrait-io/substrait#665 and addresses the breaks, adding IntervalCompound type and literal and adapting IntervalDay type and literal to have precision. For IntervalDay, backwards-compatibility is provided for reading existing protobufs using microseconds, but not for generating new protobufs, ie new plans will always use precision and subseconds.

Requires substrait-io/substrait#687

CSV support from substrait-io/substrait#646 is not implemented yet

@Blizzara
Copy link
Contributor Author

@vbarua @jacques-n even with the fixed functions_datetime.yml (see substrait-io/substrait#687), substrait-java fails to read the extensions, caused by the additions to functions_arithmetic_decimal.yaml that have form of decimal<P1,0>. The error is

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unable to parse string DECIMAL<P1,0>
 at [Source: (sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream); line: 120, column: 20] (through reference chain: io.substrait.extension.ImmutableSimpleExtension$ExtensionSignatures$Json["scalar_functions"]->java.util.ArrayList[6]->io.substrait.extension.ImmutableSimpleExtension$ScalarFunction$Json["impls"]->java.util.ArrayList[0]->io.substrait.extension.ImmutableSimpleExtension$ScalarFunctionVariant$Json["args"]->java.util.ArrayList[0]->io.substrait.extension.ImmutableSimpleExtension$ValueArgument$Json["value"])
	at app//com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:276)
...
Caused by: java.lang.UnsupportedOperationException: This construct can only be used in Type Expressions.
	at io.substrait.type.parser.ParseToPojo$Visitor.checkExpression(ParseToPojo.java:74)
	at io.substrait.type.parser.ParseToPojo$Visitor.visitDecimal(ParseToPojo.java:264)
	at io.substrait.type.parser.ParseToPojo$Visitor.visitDecimal(ParseToPojo.java:36)

so coming from

. I guess it fails as the logic above expects both parameters to be either string or int, but isn't suited for case where one is string and other is int. Could you please help me with how to fix it?

@vbarua
Copy link
Member

vbarua commented Aug 30, 2024

Running this locally, the Unable to parse string DECIMAL<P1,0> error seems to be coming from

@Override
public TypeExpression visitDecimal(final SubstraitTypeParser.DecimalContext ctx) {
boolean nullable = ctx.isnull != null;
Object precision = i(ctx.precision);
Object scale = i(ctx.scale);
if (precision instanceof Integer && scale instanceof Integer) {
return withNull(nullable).decimal((int) precision, (int) scale);
}
if (precision instanceof String && scale instanceof String) {
checkParameterizedOrExpression();
return withNullP(nullable).decimalE((String) precision, (String) scale);
}
checkExpression();
return withNullE(nullable).decimalE(ctx.precision.accept(this), ctx.scale.accept(this));
}

as you pointed out.

the logic above expects both parameters to be either string or int, but isn't suited for case where one is string and other is int.

It think this logic is too strict and we should make it looser. DECIMAL<P1,0> seems like a perfectly reasonable signature we should allow.

@jacques-n what do you think?

long milliseconds =
expr.precision() > 3
? (expr.subseconds() / (int) Math.pow(10, expr.precision() - 3))
: (expr.subseconds() * (int) Math.pow(10, 3 - expr.precision()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two branches produce the same number don't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with integer operations

@andrew-coleman
Copy link
Contributor

@vbarua @Blizzara to continue the discussion....

It think this logic is too strict and we should make it looser. DECIMAL<P1,0> seems like a perfectly reasonable signature we should allow.

@jacques-n what do you think?

I'm really keen we get this PR merged so that we can move forward with other features that require the latest protobuf additions. I'm not sure what the 'proper' fix is for this, but perhaps we do an interim fix that passes the tests and defer the proper fix to a later PR? The following solution worked for me:

andrew-coleman@4501a78

Feel free to cherrypick this commit for this PR, if you wish.

Thanks!

typeToMatch.isInstanceOf[Type.IntervalDay] || typeToMatch.isInstanceOf[ParameterizedType.IntervalDay]

override def visit(`type`: Type.IntervalCompound): Boolean =
typeToMatch.isInstanceOf[Type.IntervalCompound] || typeToMatch.isInstanceOf[ParameterizedType.IntervalCompound]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta: I wonder if we can't just re-use the Java visitor here. Not for this PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeaah frankly I'm overall not a huge fan of the way this lib works, there feels to be too much hand-rolled replication of the proto classes. I guess the reason is the proto classes are not very ergonomic to work with, but then it leads to these manual boilerplate things where there's a high risk of bugs and missing features..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the manual boilerplate transformations are easy to screw up, but the improved ergonomics for building and working with plans outweigh that for me. There's a lot of transformations we perform over substrait trees that I think would be significantly harder over the generated protobuf. I actually find myself missing the niceness of the Java POJOs in other language.

@vbarua
Copy link
Member

vbarua commented Sep 18, 2024

I think the changes @andrew-coleman proposed in #288 (comment) seem like a good fix for the test failures.

Aside from that, there's one piece of handling that is wrong in the ProtoTypeConverter that I I think we should address as well.

@Blizzara
Copy link
Contributor Author

I think the changes @andrew-coleman proposed in #288 (comment) seem like a good fix for the test failures.

I pulled those in (thanks @andrew-coleman !). However I'm not sure if they do the right thing - I guess they make the "0" treated as a parameter name, like "P" or "S" would be? Not sure, maybe there is something later on that turns the number-string into a proper number. Anyways it does make the tests pass so I'm fine with it to unblock this PR 🤷

Aside from that, there's one piece of handling that is wrong in the ProtoTypeConverter that I I think we should address as well.

Thanks, fixed!

@jacques-n
Copy link
Collaborator

DECIMAL<P1,0> seems like a perfectly reasonable signature we should allow.

Agreed

@vbarua vbarua merged commit e24ce6f into substrait-io:main Sep 20, 2024
12 checks passed
mbwhite pushed a commit to mbwhite/substrait-java that referenced this pull request Oct 1, 2024
…strait-io#288)

BREAKING CHANGE: IntervalDay now has "subsecond" and "precision" fields instead
of "microseconds". Old protobufs should be still read correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants