[Rust] COUNT(*) schema name is COUNT(UInt8) #104

alamb · 2021-04-26T13:19:43Z

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10170

Currently, the query {{COUNT()}} is being converted to {{COUNT(1)}} during logical planning, which causes the field's name to be {{COUNT(1)}}, not {{COUNT()}}

One idea is to delay the map for the physical planning, when naming is no longer an issue.

qrilka · 2023-10-04T14:25:00Z

Hi @alamb it's an old ticket but could I ask for some more details so I could try working on it?
Do I understand it's right that the problem could be seen in datafusion-cli as

❯ select count(*) from (select null);
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 1               |
+-----------------+
1 row in set. Query took 0.001 seconds.

with output showing COUNT(UInt8(1)) as field name?

I have found CountWildcardRule applied in Analyzer, if the idea is to remove it from there then what could be a more suitable place (closer to physical plan) for this step of replacing wildcard with 1?

alamb · 2023-10-04T15:55:51Z

Hi @qrilka 👋 thanks!

I think the idea was to follow the same convention as the rest of the code so the column name more closely matches the SQL expression that was typed in

So in this case, I think the expected behavior would be

❯ select count(*) from (select null);
+-----------------+
| COUNT(*)        |
+-----------------+
| 1               |
+-----------------+
1 row in set. Query took 0.001 seconds.

Or something similar

alamb · 2023-10-04T15:56:32Z

This mirrors what happens with non wildcard queries

❯ select count(x) from (select null as x);
+----------+
| COUNT(x) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.003 seconds.

qrilka · 2023-10-04T18:24:42Z

Thanks @alamb
Could you help with finding out where to move the logic of CountWildcardRule into? It looks like to keep the wildcard we need to put it after AnalyzerRules but at the same time it's not one of OptimizerRules and it doesn't look like it could be done in physical plan...

alamb · 2023-10-05T18:35:21Z

Could you help with finding out where to move the logic of CountWildcardRule into? It looks like to keep the wildcard we need to put it after AnalyzerRules but at the same time it's not one of OptimizerRules and it doesn't look like it could be done in physical plan...

I wonder if you can simply add an alias like count(1) AS "COUNT(*)" as part of whatever does the initial rewrite (is it in the SQL planner?)

qrilka · 2023-10-05T18:53:48Z

Oh, cool idea, I didn't think about that - I think it could be just added into CountWildcardRule. I'll check it out

qrilka · 2023-10-06T13:47:35Z

@alamb after some debugging I have found that we could just change COUNT_STAR_NAME to "COUNT(*)" and everything works fine and tests need to be fixed.
BTW the problem comes specifically from aggregate_statistics and if it doesn't get applied we get a correct column name. E.g.:

❯ select count(*) from (select 1 union select 2);
+----------+
| COUNT(*) |
+----------+
| 2        |
+----------+
1 row in set. Query took 0.022 seconds.

Will create a PR with this fix

This fixes the issue of column name COUNT(UInt8) appearing instead Resolves apache#104

This fixes the issue of column name COUNT(UInt8) appearing instead Resolves #104

alamb added the datafusion Changes in the datafusion crate label Apr 26, 2021

houqp added enhancement New feature or request help wanted Extra attention is needed labels Oct 18, 2021

qrilka added a commit to qrilka/arrow-datafusion that referenced this issue Oct 6, 2023

Fix column name for COUNT(*) set by AggregateStatistics

49b39f0

This fixes the issue of column name COUNT(UInt8) appearing instead Resolves apache#104

qrilka mentioned this issue Oct 6, 2023

Fix column name for COUNT(*) set by AggregateStatistics #7757

Merged

alamb closed this as completed in #7757 Oct 6, 2023

alamb pushed a commit that referenced this issue Oct 6, 2023

Fix column name for COUNT(*) set by AggregateStatistics (#7757)

c0409a7

This fixes the issue of column name COUNT(UInt8) appearing instead Resolves #104

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rust] COUNT(*) schema name is COUNT(UInt8) #104

[Rust] COUNT(*) schema name is COUNT(UInt8) #104

alamb commented Apr 26, 2021

qrilka commented Oct 4, 2023

alamb commented Oct 4, 2023

alamb commented Oct 4, 2023 •

edited

Loading

qrilka commented Oct 4, 2023

alamb commented Oct 5, 2023

qrilka commented Oct 5, 2023

qrilka commented Oct 6, 2023

[Rust] COUNT(*) schema name is COUNT(UInt8) #104

[Rust] COUNT(*) schema name is COUNT(UInt8) #104

Comments

alamb commented Apr 26, 2021

qrilka commented Oct 4, 2023

alamb commented Oct 4, 2023

alamb commented Oct 4, 2023 • edited Loading

qrilka commented Oct 4, 2023

alamb commented Oct 5, 2023

qrilka commented Oct 5, 2023

qrilka commented Oct 6, 2023

alamb commented Oct 4, 2023 •

edited

Loading