Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable user defined display_name for ScalarUDF #10417

Merged
merged 3 commits into from
May 8, 2024
Merged

Conversation

yyy1000
Copy link
Contributor

@yyy1000 yyy1000 commented May 7, 2024

Which issue does this PR close?

Closes #10376 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label May 7, 2024
@yyy1000 yyy1000 changed the title enable user defined display_name Enable user defined display_name for ScalarUDF May 7, 2024
@jayzhan211
Copy link
Contributor

Do you mind adding display_name for get_field that shows what I expect in the example given in #10376?

@yyy1000
Copy link
Contributor Author

yyy1000 commented May 8, 2024

Do you mind adding display_name for get_field that shows what I expect in the example given in #10376?

Cool, will do that.

@yyy1000 yyy1000 marked this pull request as draft May 8, 2024 02:49
@github-actions github-actions bot added the core Core DataFusion crate label May 8, 2024
@yyy1000 yyy1000 marked this pull request as ready for review May 8, 2024 03:49
@yyy1000
Copy link
Contributor Author

yyy1000 commented May 8, 2024

It's good now. @jayzhan211 You can review it when you are available. :)
In this example

async fn test_udaf_returning_struct_subquery() {
let TestContext { ctx, test_state: _ } = TestContext::new();
let sql = "select sq.first['value'], sq.first['time'] from (SELECT first(value, time) as first from t) as sq";
let expected = [
"+-----------------+----------------------------+",
"| sq.first[value] | sq.first[time] |",
"+-----------------+----------------------------+",
"| 2.0 | 1970-01-01T00:00:00.000002 |",
"+-----------------+----------------------------+",
];
assert_batches_eq!(expected, &execute(&ctx, sql).await.unwrap());
}
, after apply_function_rewrites, the logical plan would not create an alias because the display_name of get_field is the same as sq.first[value]

In this PR:
"| initial_logical_plan | Projection: (sq.first)[value], (sq.first)[time] |",
"| | SubqueryAlias: sq |",
"| | Projection: first(t.value,t.time) AS first |",
"| | Aggregate: groupBy=[[]], aggr=[[first(t.value, t.time)]] |",
"| | TableScan: t |",
"| logical_plan after apply_function_rewrites | Projection: get_field(sq.first, Utf8("value")), get_field(sq.first, Utf8("time")) |",
"| | SubqueryAlias: sq |",
"| | Projection: first(t.value,t.time) AS first |",
"| | Aggregate: groupBy=[[]], aggr=[[first(t.value, t.time)]] |",
"| | TableScan: t |",

Before this PR:
"| initial_logical_plan | Projection: (sq.first)[value], (sq.first)[time] |",
"| | SubqueryAlias: sq |",
"| | Projection: first(t.value,t.time) AS first |",
"| | Aggregate: groupBy=[[]], aggr=[[first(t.value, t.time)]] |",
"| | TableScan: t |",
"| logical_plan after apply_function_rewrites | Projection: get_field(sq.first, Utf8("value")) AS sq.first[value], get_field(sq.first, Utf8("time")) AS sq.first[time] |",
"| | SubqueryAlias: sq |",
"| | Projection: first(t.value,t.time) AS first |",
"| | Aggregate: groupBy=[[]], aggr=[[first(t.value, t.time)]] |",
"| | TableScan: t |",
"| logical_plan after inline_table_scan | SAME TEXT AS ABOVE


Ok(format!("{}[{}]", args[0].display_name()?, name))
}

Copy link
Contributor

@jayzhan211 jayzhan211 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cli in this PR

DataFusion CLI v38.0.0
> select struct('a');
+-------------------+
| struct(Utf8("a")) |
+-------------------+
| {c0: a}           |
+-------------------+
1 row(s) fetched. 
Elapsed 0.023 seconds.

> select get_field(struct('a'), 'c0');
+-----------------------+
| struct(Utf8("a"))[c0] |
+-----------------------+
| a                     |
+-----------------------+
1 row(s) fetched. 
Elapsed 0.007 seconds.

main branch:

DataFusion CLI v38.0.0
> select struct('a');
+-------------------+
| struct(Utf8("a")) |
+-------------------+
| {c0: a}           |
+-------------------+
1 row(s) fetched. 
Elapsed 0.014 seconds.

> select get_field(struct('a'), 'c0');
+-----------------------------------------+
| get_field(struct(Utf8("a")),Utf8("c0")) |
+-----------------------------------------+
| a                                       |
+-----------------------------------------+
1 row(s) fetched. 
Elapsed 0.004 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this example has better view. 🙌

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 7df085e into apache:main May 8, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented May 8, 2024

Thanks @yyy1000 and @jayzhan211

@yyy1000 yyy1000 deleted the display_name branch May 8, 2024 17:11
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* enable user defined display_name

* add display_name to get_field

* add physical name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support user defined display for UDF
3 participants