Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3668][CH]Fix to_date function performance #3701

Merged
merged 8 commits into from
Nov 17, 2023

Conversation

KevinyhZou
Copy link
Contributor

@KevinyhZou KevinyhZou commented Nov 14, 2023

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

(Fixes: #3668)

How was this patch tested?

TEST BY UT

性能测试数据

3000W 行正常数据
测试SQL: select count(1) from $test_tbl where to_date($col) > '1990-01-01'
PR改动前耗时: 2.983s, 2.686s, 2.804s
PR改动后耗时: 2.94s,2.861s,2.842s;

3000W行数据 (其中2500W行是NULL,500W是正常数据)
测试SQL: select count(1) from $test_tbl where to_date($col) > '1990-01-01'
PR改动前耗时:0.621s, 0.614s, 0.677s
PR改动后耗时:0.631s,0.641s,0.692s;

3000W行数据 (其中2500W数据是不符合日期格式的随机字符串,500W行是正常数据)
测试SQL: select count(1) from $test_tbl where to_date($col) > '1990-01-01'
PR改动前耗时:6.148s,6.018s,5.845s
PR改动后耗时:3.188s,3.055s,3.08s

对比发现,正常数据测试情况下性能接近,在某些异常场景下性能有所提升

Copy link

#3135

Copy link

Run Gluten Clickhouse CI

@@ -2227,5 +2227,28 @@ class GlutenClickHouseTPCHParquetSuite extends GlutenClickHouseTPCHAbstractSuite
compareResultsAgainstVanillaSpark(select_sql, true, { _ => })
spark.sql("drop table test_tbl_3521")
}

test("GLUTEN-3135 revert: Bug fix to_date") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use ut GLUTEN-3135: Bug fix to_date, re-open it, don't create one again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@KevinyhZou KevinyhZou changed the title [GLUTEN-3135][CH]Fix to_date function performance [GLUTEN-3668][CH]Fix to_date function performance Nov 14, 2023
Copy link

#3668

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@lgbo-ustc
Copy link
Contributor

to_date 是支持指定日期格式,这个有考虑了?

Copy link

Run Gluten Clickhouse CI

@KevinyhZou
Copy link
Contributor Author

to_date 是支持指定日期格式,这个有考虑了?

我考虑下这个

@zzcclp zzcclp requested a review from liuneng1994 November 15, 2023 01:57
@KevinyhZou
Copy link
Contributor Author

to_date 是支持指定日期格式,这个有考虑了?

to_date 指定日期格式不需要考虑支持,如果使用to_date($col, 'yyyy-MM-dd') 按照原本的指定,会使用parseDateTimeInJodaSyntaxOrNull 来 解析

Copy link

Run Gluten Clickhouse CI

@liuneng1994 liuneng1994 merged commit ad23092 into apache:main Nov 17, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Performance regresses seriously after PR 3169 merge when executing convert string to date
4 participants