Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Dialect requires derived table alias #12994

Merged

Conversation

peasee
Copy link
Contributor

@peasee peasee commented Oct 18, 2024

Which issue does this PR close?

Closes #12993.

Rationale for this change

Fixes a bug preventing derived logical plans from running in MySQL due to a requirement that every derived table must have an alias:

The [AS] tbl_name clause is mandatory because every table in a FROM clause must have a name.

What changes are included in this PR?

Includes a dialect update to specify whether derived tables require aliases. For dialects that do, set a static alias. I've simply selected the name of the operation prefixed with derived_, like derived_sort, etc.

Are these changes tested?

Yes

Are there any user-facing changes?

No

* fix: Add Dialect option for requiring table aliases

* feat: Add CustomDialectBuilder for requires_table_alias

* docs: Spelling

* refactor: rename requires_derived_table_alias

* refactor: rename requires_derived_table_alias
Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peasee for working on this. I left some suggestions here.

datafusion/sql/src/unparser/plan.rs Outdated Show resolved Hide resolved
expected:
// top projection sort gets derived into a subquery
// for MySQL, this subquery needs an alias
"SELECT `j1_min` FROM (SELECT min(`ta`.`j1_id`) AS `j1_min`, min(`ta`.`j1_id`) FROM `j1` AS `ta` ORDER BY min(`ta`.`j1_id`) ASC) AS `derived_sort` LIMIT 10",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the test only covers the derived_sort case but some cases aren't covered, such as derived_projection, derived_limit,...
Could you add more tests for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another test case for a more complex SQL that results in 2 nested derives for derived_sort and derived_distinct in the same query.

I'm not too sure what SQL to use to trigger derived_limit or derived_projection though... any ideas that I could include?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For derived_projection, I found the SQL can trigger:

select j1_id from (select 1 as j1_id)
 -> SELECT `j1_id` FROM (SELECT 1 AS `j1_id`) AS `derived_projection`

However, I guess it's not a valid SQL for MySQL, right? I tried it in DataFusion and DuckDB, they accept it but Postgres doesn't allow it. Maybe, it could be a DataFusion dialect to MySQL dialect case.

For derived_limit, a similar case can trigger it:

select * from (select * from j1 limit 10) 
  -> SELECT * FROM (SELECT * FROM `j1` LIMIT 10) AS `derived_limit`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid for MySQL, I just tested it out in my CLI. I've added both as new cases, thanks!

@github-actions github-actions bot added the sql SQL Planner label Oct 18, 2024
@goldmedal
Copy link
Contributor

By the way, I tried to pull your branch and run the tests on my laptop but the tests always fail. After merging with the latest main branch, the tests are passed.

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good improvement for the derived table. 👍 It makes sense to me.
I noticed some databases don't support this kind of unnamed subquery, see #12896 (comment)
This PR would be helpful.

@goldmedal
Copy link
Contributor

Maybe @phillipleblanc and @sgrebnov want to take a look at this PR.

@phillipleblanc
Copy link
Contributor

Looks good to me, thanks @goldmedal and @peasee

@alamb
Copy link
Contributor

alamb commented Oct 20, 2024

@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?

@goldmedal
Copy link
Contributor

@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?

Sure, I plan to merge this PR if no more comments tomorrow.

@peasee
Copy link
Contributor Author

peasee commented Oct 21, 2024

@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?

Sure, I plan to merge this PR if no more comments tomorrow.

Let's hold off on merging this for now. I think I've introduced a regression that I'd like to test more first before merging.

@peasee
Copy link
Contributor Author

peasee commented Oct 21, 2024

@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?

Sure, I plan to merge this PR if no more comments tomorrow.

Let's hold off on merging this for now. I think I've introduced a regression that I'd like to test more first before merging.

Okay, looks like my PR didn't introduce the regression but there's a somewhat related bug currently on main. Take the following SQL:

SELECT j1_id FROM (SELECT ta.j1_id AS j1_id FROM j1 ta);

This gets rewritten in the GenericDialect to:

SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta)

Which is invalid, because the subquery is un-aliased. I tested it in DuckDB to confirm:

D SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta);
Binder Error: Referenced table "ta" not found!
Candidate tables: "unnamed_subquery"
LINE 1: SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ...

Let's merge my PR, and I can raise a new issue for this other bug @goldmedal ?

@goldmedal
Copy link
Contributor

This gets rewritten in the GenericDialect to:

SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta)

It's valid for DataFusion

> create table j1(j1_id int);
0 row(s) fetched. 
Elapsed 0.085 seconds.

> SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta);
+-------+
| j1_id |
+-------+
+-------+
0 row(s) fetched. 
Elapsed 0.042 seconds.

Generally, GenericDialect fits the DataFusion syntax. This behavior makes sense to me.

Which is invalid, because the subquery is un-aliased. I tested it in DuckDB to confirm:

D SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta);
Binder Error: Referenced table "ta" not found!
Candidate tables: "unnamed_subquery"
LINE 1: SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ...

Let's merge my PR, and I can raise a new issue for this other bug @goldmedal ?

How about MySQLDialect? It enables requires_derived_table_alias. Which would be the result?

SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta) derived_projection

or

SELECT j1_id FROM (SELECT ta.j1_id FROM j1 ta) `derived_projection`

If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it.
If it's the second one, I think the user can customize the dialect::requires_derived_table_alias for the DuckDB syntax purpose.

@peasee
Copy link
Contributor Author

peasee commented Oct 21, 2024

If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it.

My bad, I should've clarified better. It is the first one with MySqlDialect. I had just used GenericDialect as the example, but it affects both:

SELECT `ta`.`j1_id` FROM (SELECT `ta`.`j1_id` FROM `j1` AS `ta`) AS `derived_projection`

@goldmedal
Copy link
Contributor

If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it.

My bad, I should've clarified better. It is the first one with MySqlDialect. I had just used GenericDialect as the example, but it affects both:

SELECT `ta`.`j1_id` FROM (SELECT `ta`.`j1_id` FROM `j1` AS `ta`) AS `derived_projection`

I see. I think it's an issue but we can fix it in the follow-up PR. Let's merge this PR first.
Could you help to file an issue to trace it?

@peasee
Copy link
Contributor Author

peasee commented Oct 21, 2024

I see. I think it's an issue but we can fix it in the follow-up PR. Let's merge this PR first. Could you help to file an issue to trace it?

#13027

@goldmedal goldmedal merged commit b42d9b8 into apache:main Oct 21, 2024
24 checks passed
@goldmedal
Copy link
Contributor

Thanks @peasee and @phillipleblanc @alamb for reviewing. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MySQL does not support derived tables without aliases
4 participants