Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] COUNT(*) schema name is COUNT(UInt8) #104

Closed
alamb opened this issue Apr 26, 2021 · 7 comments · Fixed by #7757
Closed

[Rust] COUNT(*) schema name is COUNT(UInt8) #104

alamb opened this issue Apr 26, 2021 · 7 comments · Fixed by #7757
Labels
datafusion Changes in the datafusion crate enhancement New feature or request help wanted Extra attention is needed

Comments

@alamb
Copy link
Contributor

alamb commented Apr 26, 2021

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10170

Currently, the query {{COUNT()}} is being converted to {{COUNT(1)}} during logical planning, which causes the field's name to be {{COUNT(1)}}, not {{COUNT()}}

 One idea is to delay the map for the physical planning, when naming is no longer an issue.

 

@alamb alamb added the datafusion Changes in the datafusion crate label Apr 26, 2021
@houqp houqp added enhancement New feature or request help wanted Extra attention is needed labels Oct 18, 2021
@qrilka
Copy link
Contributor

qrilka commented Oct 4, 2023

Hi @alamb it's an old ticket but could I ask for some more details so I could try working on it?
Do I understand it's right that the problem could be seen in datafusion-cli as

❯ select count(*) from (select null);
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 1               |
+-----------------+
1 row in set. Query took 0.001 seconds.

with output showing COUNT(UInt8(1)) as field name?

I have found CountWildcardRule applied in Analyzer, if the idea is to remove it from there then what could be a more suitable place (closer to physical plan) for this step of replacing wildcard with 1?

@alamb
Copy link
Contributor Author

alamb commented Oct 4, 2023

Hi @qrilka 👋 thanks!

I think the idea was to follow the same convention as the rest of the code so the column name more closely matches the SQL expression that was typed in

So in this case, I think the expected behavior would be

❯ select count(*) from (select null);
+-----------------+
| COUNT(*)        |
+-----------------+
| 1               |
+-----------------+
1 row in set. Query took 0.001 seconds.

Or something similar

@alamb
Copy link
Contributor Author

alamb commented Oct 4, 2023

This mirrors what happens with non wildcard queries

❯ select count(x) from (select null as x);
+----------+
| COUNT(x) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.003 seconds.

@qrilka
Copy link
Contributor

qrilka commented Oct 4, 2023

Thanks @alamb
Could you help with finding out where to move the logic of CountWildcardRule into? It looks like to keep the wildcard we need to put it after AnalyzerRules but at the same time it's not one of OptimizerRules and it doesn't look like it could be done in physical plan...

@alamb
Copy link
Contributor Author

alamb commented Oct 5, 2023

Could you help with finding out where to move the logic of CountWildcardRule into? It looks like to keep the wildcard we need to put it after AnalyzerRules but at the same time it's not one of OptimizerRules and it doesn't look like it could be done in physical plan...

I wonder if you can simply add an alias like count(1) AS "COUNT(*)" as part of whatever does the initial rewrite (is it in the SQL planner?)

@qrilka
Copy link
Contributor

qrilka commented Oct 5, 2023

Oh, cool idea, I didn't think about that - I think it could be just added into CountWildcardRule. I'll check it out

@qrilka
Copy link
Contributor

qrilka commented Oct 6, 2023

@alamb after some debugging I have found that we could just change COUNT_STAR_NAME to "COUNT(*)" and everything works fine and tests need to be fixed.
BTW the problem comes specifically from aggregate_statistics and if it doesn't get applied we get a correct column name. E.g.:

❯ select count(*) from (select 1 union select 2);
+----------+
| COUNT(*) |
+----------+
| 2        |
+----------+
1 row in set. Query took 0.022 seconds.

Will create a PR with this fix

qrilka added a commit to qrilka/arrow-datafusion that referenced this issue Oct 6, 2023
This fixes the issue of column name COUNT(UInt8) appearing instead
Resolves apache#104
alamb pushed a commit that referenced this issue Oct 6, 2023
This fixes the issue of column name COUNT(UInt8) appearing instead
Resolves #104
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants