Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Timezone to Scalar::Time* types, and better timezone awareness to Datafusion's time types #1455

Merged
merged 10 commits into from
Dec 21, 2021

Conversation

maxburke
Copy link
Contributor

Closes #1454

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Dec 16, 2021
@alamb alamb changed the title Add a little bit of timezone awareness to Datafusion's time types Add Timezone to Scalar::Time* types, and better timezone awareness to Datafusion's time types Dec 17, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maxburke -- this looks like a great improvement!

Would it be possible to add some basic sql end to end tests in sql.rs??

Perhaps taking advantage of make_timestamp_table to test: https://github.com/apache/arrow-datafusion/blob/0052667afae33ba9e549256d0d5d47e2f45e6ffb/datafusion/tests/sql.rs#L4106-L4109

I am specifically thinking about coverage for the coercion logic you added (perhaps showing some basic comparison on timestamp types as wwell as min and max for timestamp with timezone types?)

Also, I am trying out this branch with IOx and so far it is going well 👍 : https://github.com/influxdata/influxdb_iox/pull/3407

@@ -974,7 +974,7 @@ impl std::fmt::Display for Expr {
ref left,
ref right,
ref op,
} => write!(f, "{} {} {}", left, op, right),
} => write!(f, "{:?} {} {:?}", left, op, right),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change to use Debug formatting rather than Display formatting ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was part of my own debugging when trying to sort out why my queries were failing. I'll remove.

match (lhs_type, rhs_type) {
(Utf8, Date32) => Some(Date32),
(Date32, Utf8) => Some(Date32),
(Utf8, Date64) => Some(Date64),
(Date64, Utf8) => Some(Date64),
(Timestamp(lhs_unit, lhs_tz), Timestamp(rhs_unit, rhs_tz)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +137 to +140
(l, r) => {
assert_eq!(l, r);
l.clone()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(l, r) => {
assert_eq!(l, r);
l.clone()
}
(l, r) if l == r => {
l.clone()
}
_ => unreachable!()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but we need an else block in order to make the types align, so this becomes:

(l, r) => if l ==r { l.clone() } else { unreachable!() }

which is semantically identical, but the version with assert_eq! will provide more contextual information when the condition doesn't hold

@maxburke
Copy link
Contributor Author

I am specifically thinking about coverage for the coercion logic you added (perhaps showing some basic comparison on timestamp types as wwell as min and max for timestamp with timezone types?)

Done!

@liukun4515
Copy link
Contributor

it's a great fix.

@houqp houqp added the bug Something isn't working label Dec 20, 2021
@@ -6615,3 +6622,357 @@ async fn csv_query_with_decimal_by_sql() -> Result<()> {
assert_batches_eq!(expected, &actual);
Ok(())
}

#[tokio::test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about add some cases with different TIME_ZONE, for example UTC-7 or UTC+8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't do any computation with the timezones, it just passes them through from the underlying data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, among other things I also don't think arrow-rs respects timezones fully (and it doesn't have a full set of arithmetic on time/date/timestamp types either)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thank you @maxburke

#[tokio::test]
async fn timestamp_coercion() -> Result<()> {
{
let mut ctx = ExecutionContext::new();
let table_a = make_timestamp_table::<TimestampSecondType>()?;
let table_b = make_timestamp_table::<TimestampMillisecondType>()?;
let table_a =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datafusion should not strip out timezone information from existing types
4 participants