-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3668][CH]Fix to_date function performance #3701
Conversation
Run Gluten Clickhouse CI |
@@ -2227,5 +2227,28 @@ class GlutenClickHouseTPCHParquetSuite extends GlutenClickHouseTPCHAbstractSuite | |||
compareResultsAgainstVanillaSpark(select_sql, true, { _ => }) | |||
spark.sql("drop table test_tbl_3521") | |||
} | |||
|
|||
test("GLUTEN-3135 revert: Bug fix to_date") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use ut GLUTEN-3135: Bug fix to_date
, re-open it, don't create one again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
to_date 是支持指定日期格式,这个有考虑了? |
Run Gluten Clickhouse CI |
我考虑下这个 |
to_date 指定日期格式不需要考虑支持,如果使用to_date($col, 'yyyy-MM-dd') 按照原本的指定,会使用parseDateTimeInJodaSyntaxOrNull 来 解析 |
Run Gluten Clickhouse CI |
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
(Fixes: #3668)
How was this patch tested?
TEST BY UT
性能测试数据
3000W 行正常数据
测试SQL:
select count(1) from $test_tbl where to_date($col) > '1990-01-01'
PR改动前耗时: 2.983s, 2.686s, 2.804s
PR改动后耗时: 2.94s,2.861s,2.842s;
3000W行数据 (其中2500W行是NULL,500W是正常数据)
测试SQL:
select count(1) from $test_tbl where to_date($col) > '1990-01-01'
PR改动前耗时:0.621s, 0.614s, 0.677s
PR改动后耗时:0.631s,0.641s,0.692s;
3000W行数据 (其中2500W数据是不符合日期格式的随机字符串,500W行是正常数据)
测试SQL:
select count(1) from $test_tbl where to_date($col) > '1990-01-01'
PR改动前耗时:6.148s,6.018s,5.845s
PR改动后耗时:3.188s,3.055s,3.08s
对比发现,正常数据测试情况下性能接近,在某些异常场景下性能有所提升