Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alias for rel #571

Closed
my-vegetable-has-exploded opened this issue Nov 7, 2023 · 3 comments · Fixed by #649
Closed

Support alias for rel #571

my-vegetable-has-exploded opened this issue Nov 7, 2023 · 3 comments · Fixed by #649

Comments

@my-vegetable-has-exploded
Copy link

my-vegetable-has-exploded commented Nov 7, 2023

Since substrait always use field indices to refer to fields,it may cause some problem when revert from substrait to logical plan.

Taking datafusion-6867 for example, datafusion use table+field to distinguish fields.

SELECT d1.a FROM data d1 JOIN data d2 ON d1.e <> d2.e

Without alias for rel, it can't distinguish between left input and right input in self join.

I think alias for rel is not complicated to implement and improves ease of use.

A simple design is given below.

message AliasRel {
  RelCommon common = 1;
  Rel input = 2;
  // alias name
  string names = 3;
  substrait.extensions.AdvancedExtension advanced_extension = 10;
}
@westonpace
Copy link
Member

Thanks for creating this issue. The concept of aliases has come up in community discussions in the past but I can't find a good concrete issue for this idea.

I think it would be a good idea to add aliases as an "optional extension" or an "official hint" of sorts. We have also talked in the past about having some way to annotate parts of the plan with extra information.

My concern with your proposed AliasRel is that it will cause the plan to become unreadable by engines that do not understand it. Or, to put it another way, it forces all consumers to handle aliases whether they care about them or not.

I think something using AdvancedExtension might be an easier path forward. In particular, the enhancement type of advanced extension specifically says that the information can be ignored by consumers if they choose to do so. It's also something that could be done without any change to the Substrait spec.

I'm also going to CC @jacques-n who has talked about "annotations" before and I think was very close to writing up some kind of proposal / idea doc at one point in time.

@my-vegetable-has-exploded
Copy link
Author

Thanks for your patience!

I think it would be a good idea to add aliases as an "optional extension" or an "official hint" of sorts. We have also talked in the past about having some way to annotate parts of the plan with extra information.

That's really a better idea.

I think something using AdvancedExtension might be an easier path forward. In particular, the enhancement type of advanced extension specifically says that the information can be ignored by consumers if they choose to do so. It's also something that could be done without any change to the Substrait spec.

Thank you for your advice. Do you mean that we can put the alias name into AdvancedExtension? By the way, the comment shows that enhancement cannot be ignored by a consumer which iis a little different from your description.

  // An enhancement alter semantics. Cannot be ignored by a consumer.
  google.protobuf.Any enhancement = 2;

@westonpace
Copy link
Member

Do you mean that we can put the alias name into AdvancedExtension?

Yes

By the way, the comment shows that enhancement cannot be ignored by a consumer which iis a little different from your description.

Good catch, you are right, I got it backwards. optimization is the one to use.

EpsilonPrime pushed a commit that referenced this issue Jul 3, 2024
Same goal as in #648 - to support what Spark's and DataFusion's
SubqueryAlias relation does.

As Substrait mostly only referes to columns by their index, there is no
inherent need for table name/qualifiers within Substrait. However, some
consumers, e.g. DataFusion, require column names to be either unique or
qualified for joins, which is troublesome w/o the possibility to qualify
relations.

Also, for debugging failed plans and for roundtrip testing of X ->
Substrait -> X conversions, it would be convenient to have proper,
human-readable names to refer to.

Closes #571
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants