Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant fold / optimize to_timestamp function during planning #387

Merged
merged 7 commits into from
May 25, 2021

Conversation

msathis
Copy link
Contributor

@msathis msathis commented May 22, 2021

Which issue does this PR close?

Fixes #383

Rationale for this change

Optimize to_timestamp function

What changes are included in this PR?

Optimize to_timestamp function

Are there any user-facing changes?

N/A

@msathis msathis changed the title optimize to_timestamp ptimize to_timestamp May 22, 2021
@msathis msathis changed the title ptimize to_timestamp Optimize to_timestamp function May 22, 2021
@codecov-commenter
Copy link

codecov-commenter commented May 22, 2021

Codecov Report

Merging #387 (7961343) into master (db4f098) will decrease coverage by 0.09%.
The diff coverage is 61.70%.

❗ Current head 7961343 differs from pull request most recent head 2b9b9b7. Consider uploading reports for the commit 2b9b9b7 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #387      +/-   ##
==========================================
- Coverage   74.94%   74.84%   -0.10%     
==========================================
  Files         146      146              
  Lines       24314    24553     +239     
==========================================
+ Hits        18221    18377     +156     
- Misses       6093     6176      +83     
Impacted Files Coverage Δ
...ta/rust/core/src/execution_plans/shuffle_reader.rs 0.00% <0.00%> (ø)
datafusion-cli/src/main.rs 0.00% <0.00%> (ø)
datafusion/src/logical_plan/expr.rs 84.34% <ø> (ø)
datafusion/src/physical_plan/group_scalar.rs 65.88% <ø> (ø)
benchmarks/src/bin/tpch.rs 30.84% <11.11%> (+0.01%) ⬆️
datafusion/src/physical_optimizer/pruning.rs 89.73% <38.46%> (-0.88%) ⬇️
datafusion-cli/src/print_format.rs 84.44% <58.82%> (-5.97%) ⬇️
datafusion/src/scalar.rs 58.31% <63.26%> (+2.83%) ⬆️
datafusion/src/physical_plan/repartition.rs 82.45% <70.96%> (-1.89%) ⬇️
datafusion/src/physical_plan/parquet.rs 81.76% <83.33%> (+0.49%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db4f098...2b9b9b7. Read the comment docs.

@alamb alamb added the datafusion Changes in the datafusion crate label May 23, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @msathis ! This is a great start.

I think this PR needs some tests -- there are some good examples later in the module.

cc @NGA-TRAN

@@ -217,6 +218,29 @@ impl<'a> ExprRewriter for ConstantRewriter<'a> {
.query_execution_start_time
.timestamp_nanos(),
))),
Expr::ScalarFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching on the literal argument is good as the rewrite pass happens after all children are visited 👍

datafusion/src/optimizer/constant_folding.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @msathis for the work. It will help us to push predicate with this type of constant down the plan in InfluxDB_IOx. I just have a few minor comments

args,
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happen if the args.is_empty()? Do we use a default value or return syntax error we simply return false for this expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of all error cases including empty, we don't optimise. So the default non-optimised flow continues.

datafusion/src/optimizer/constant_folding.rs Show resolved Hide resolved
datafusion/src/optimizer/constant_folding.rs Show resolved Hide resolved
let table_scan = test_table_scan().unwrap();
let proj = vec![Expr::ScalarFunction {
args: vec![Expr::Literal(ScalarValue::Utf8(Some(
"I'M NOT A TIMESTAMP".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great @msathis -- thank you (and thanks to @NGA-TRAN for the reviews)!

@alamb alamb changed the title Optimize to_timestamp function Constant fold / optimize to_timestamp function during planning May 24, 2021
@alamb
Copy link
Contributor

alamb commented May 24, 2021

Oh no -- we have some clippy errors

https://github.com/apache/arrow-datafusion/pull/387/checks?check_run_id=2658215529


error: useless use of `format!`
   --> datafusion/src/optimizer/constant_folding.rs:680:24
    |
680 |           let expected = format!(
    |  ________________________^
681 | |             "Projection: TimestampNanosecond(1599566400000000000)\
682 | |             \n  TableScan: test projection=None",
683 | |         );
    | |_________^
    |
    = note: `-D clippy::useless-format` implied by `-D warnings`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_format
help: consider using `.to_string()`
    |
680 |         let expected = "Projection: TimestampNanosecond(1599566400000000000)\
681 |             \n  TableScan: test projection=None".to_string();
    |

error: useless use of `format!`
   --> datafusion/src/optimizer/constant_folding.rs:703:24
    |
703 |           let expected = format!(
    |  ________________________^
704 | |             "Projection: totimestamp(Utf8(\"I\'M NOT A TIMESTAMP\"))\
705 | |             \n  TableScan: test projection=None",
706 | |         );
    | |_________^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_format
help: consider using `.to_string()`

@msathis
Copy link
Contributor Author

msathis commented May 25, 2021

Oh no -- we have some clippy errors

https://github.com/apache/arrow-datafusion/pull/387/checks?check_run_id=2658215529


error: useless use of `format!`
   --> datafusion/src/optimizer/constant_folding.rs:680:24
    |
680 |           let expected = format!(
    |  ________________________^
681 | |             "Projection: TimestampNanosecond(1599566400000000000)\
682 | |             \n  TableScan: test projection=None",
683 | |         );
    | |_________^
    |
    = note: `-D clippy::useless-format` implied by `-D warnings`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_format
help: consider using `.to_string()`
    |
680 |         let expected = "Projection: TimestampNanosecond(1599566400000000000)\
681 |             \n  TableScan: test projection=None".to_string();
    |

error: useless use of `format!`
   --> datafusion/src/optimizer/constant_folding.rs:703:24
    |
703 |           let expected = format!(
    |  ________________________^
704 | |             "Projection: totimestamp(Utf8(\"I\'M NOT A TIMESTAMP\"))\
705 | |             \n  TableScan: test projection=None",
706 | |         );
    | |_________^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_format
help: consider using `.to_string()`

For some strange reason cargo clippy on my local doesn't throw these errors.

@alamb
Copy link
Contributor

alamb commented May 25, 2021

I pushed the clippy change -- hopefully that will get this PR to green

@alamb
Copy link
Contributor

alamb commented May 25, 2021

For some strange reason cargo clippy on my local doesn't throw these errors.

It may be related to what version of rust is used. I also have some troubles with clippy sometimes not finding errors. I run the following locally (I am somewhat embarrassed but it does work for me 🤷 )

find . -name '*.rs' | grep -v target | xargs touch && cargo clippy --all --all-targets -- -D warnings

@msathis
Copy link
Contributor Author

msathis commented May 25, 2021

For some strange reason cargo clippy on my local doesn't throw these errors.

It may be related to what version of rust is used. I also have some troubles with clippy sometimes not finding errors. I run the following locally (I am somewhat embarrassed but it does work for me 🤷 )

find . -name '*.rs' | grep -v target | xargs touch && cargo clippy --all --all-targets -- -D warnings

Thanks. That's nice. I'll copy & use it 👍

@Dandandan
Copy link
Contributor

Dandandan commented May 25, 2021

For some strange reason cargo clippy on my local doesn't throw these errors.

It may be related to what version of rust is used. I also have some troubles with clippy sometimes not finding errors. I run the following locally (I am somewhat embarrassed but it does work for me )

find . -name '*.rs' | grep -v target | xargs touch && cargo clippy --all --all-targets -- -D warnings

Touching the files (or cleaning the build) should not be needed anymore since 1.52 https://blog.rust-lang.org/2021/05/06/Rust-1.52.0.html

Previously, running cargo check followed by cargo clippy wouldn't actually run Clippy: the build caching in Cargo didn't differentiate between the two. In 1.52, however, this has been fixed, which means that users will get the expected behavior independent of the order in which they run the two commands.

@alamb
Copy link
Contributor

alamb commented May 25, 2021

Thanks again @msathis !

@alamb alamb merged commit d9b0447 into apache:master May 25, 2021
jimexist pushed a commit to jimexist/arrow-datafusion that referenced this pull request May 26, 2021
…che#387)

* optimize to_timestamp

* cargo fmt

* fix clippy

* Added testcase & removed optimization for invalid timestamps

* Added negative testcase

* Fix clippy

* fix clippy

Co-authored-by: Sathis Kumar <sathis.kumar@udemy.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@houqp houqp added the performance Make DataFusion faster label Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate performance Make DataFusion faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support constant folding for to_timestamp(constant time) function
6 participants