Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval coercion:date_bin('1 hour',...) does not work but date_bin(interval '1 hour', ... does #4853

Closed
Tracked by #3148
alamb opened this issue Jan 8, 2023 · 5 comments · Fixed by #5117
Closed
Tracked by #3148
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Jan 8, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Users of IOx are now using date_bin and hit a UX issue.

date_bin with a string value does not work and gives a hard to understand message:

select date_bin('1 hour', column1, TIMESTAMP '2001-01-01 00:00:00Z') 
from (values 
  (timestamp '2022-01-01 00:00:00'), 
  (timestamp '2022-01-01 01:00:00'), 
(timestamp '2022-01-02 00:00:00')
) as sq;
Plan("Coercion from [Utf8, Timestamp(Nanosecond, None), Timestamp(Nanosecond, None)] to the signature Exact([Interval(DayTime), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None)]) failed.")

But it does work when the string is explicitly cast to an interval, with interval '1 hour'

select date_bin(interval '1 hour', column1, TIMESTAMP '2001-01-01 00:00:00Z') 
from (values 
  (timestamp '2022-01-01 00:00:00'), 
  (timestamp '2022-01-01 01:00:00'), 
(timestamp '2022-01-02 00:00:00')
) as sq;
+-----------------------------------------------------------------------------+
| datebin(IntervalDayTime("3600000"),sq.column1,Utf8("2001-01-01 00:00:00Z")) |
+-----------------------------------------------------------------------------+
| 2022-01-01T00:00:00                                                         |
| 2022-01-01T01:00:00                                                         |
| 2022-01-02T00:00:00                                                         |
+-----------------------------------------------------------------------------+
3 rows in set. Query took 0.015 seconds.

Describe the solution you'd like

I would like the string coerced to an interval, like postgres:

postgres=# select date_bin('1 hour', column1, TIMESTAMP '2001-01-01 00:00:00Z') 
from (values 
  (timestamp '2022-01-01 00:00:00'), 
  (timestamp '2022-01-01 01:00:00'), 
(timestamp '2022-01-02 00:00:00')
) as sq;
      date_bin       
---------------------
 2022-01-01 00:00:00
 2022-01-01 01:00:00
 2022-01-02 00:00:00
(3 rows)

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
It looks to me like utf8 --> interval coercion simply needs to be added to the functional coercion rule:

https://github.com/apache/arrow-datafusion/blob/f9b72f4230687b884a92f79d21762578d3d56281/datafusion/expr/src/type_coercion/functions.rs#L177-L179

Marking this as a good first issue as it should be straightforward and a good way to learn about the datafusion codebase

@alamb alamb added enhancement New feature or request good first issue Good for newcomers labels Jan 8, 2023
@alamb
Copy link
Contributor Author

alamb commented Jan 8, 2023

cc @waitingkuo and @Jefffrey as I see you working in this area reacently

@alamb alamb changed the title Interval coercion:date_bin('1 hour',...) works but date_bin(interval '1 hour', ... does not Interval coercion:date_bin('1 hour',...) does not work but date_bin(interval '1 hour', ... does Jan 8, 2023
@waitingkuo
Copy link
Contributor

waitingkuo commented Jan 10, 2023

thank you @alamb , I can take it if there's no one working on this.

@alamb
Copy link
Contributor Author

alamb commented Jan 11, 2023

That would be amazing @waitingkuo . Thank you 🙏

Not sure if you have seen the new sqllogictets but it might make writing tests easy for you. For example https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/test_files/timestamps.slt

@alamb
Copy link
Contributor Author

alamb commented Jan 30, 2023

I am planning to work on this as we need it downstream in IOx

@alamb
Copy link
Contributor Author

alamb commented Jan 30, 2023

One issue here is that interval coercion from string --> interval is not supported in the arrow cast kernels

DataFusion has its own interval parsing logic here: https://github.com/apache/arrow-datafusion/blob/67b1da8a71397185a49a6418af3e1cfe9329f6c5/datafusion/common/src/parsers.rs#L115

But the type coercion code relies on upstream arrow interval support. I think I can special case the coerction temporarily in datafusion and then go contribute the proper coercion logic back upstream to arrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants