[SPARK-46482][SQL] Revert SPARK-43049 due to performance regression of using CLOB #44452

sadikovi · 2023-12-21T21:32:34Z

What changes were proposed in this pull request?

The PR reverts 529f2d5 (PR #40683) due to a performance regression when writing to an Oracle database due to columns are not being created as CLOB instead of VARCHAR2.

Performance was confirmed by internal benchmark of writing 20 string fields to an Oracle database. Previously the SQL statement would finish within 2 min, with the master branch it takes over 10 min (and I cancel the job, it could be running for longer).

Why are the changes needed?

Fixes performance regression.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A. This is a revert of an existing change.

Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44442

…gType for Oracle JDBC" This reverts commit 529f2d5.

sadikovi · 2023-12-21T22:36:31Z

We might need to revert this in branch-3.5, the patch was also merged there.

sadikovi · 2023-12-21T22:37:17Z

cc @cloud-fan @yaooqinn

…string ### What changes were proposed in this pull request? Revert SPARK-43049 to use Oracle Varchar (255) for string for performance consideration ### Why are the changes needed? for performance consideration ### Does this PR introduce _any_ user-facing change? yes, storing strings in Oracle table, which is defined by spark DDL with string columns. Users will get an error if string values exceed 255 ```java org.apache.spark.SparkRuntimeException: [EXCEED_LIMIT_LENGTH] Exceeds char/varchar type length limitation: 255. SQLSTATE: 54006 [info] at org.apache.spark.sql.errors.QueryExecutionErrors$.exceedMaxLimit(QueryExecutionErrors.scala:2512) ``` ### How was this patch tested? revised unit tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #44452 Closes #44442 from yaooqinn/SPARK-46478. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…string ### What changes were proposed in this pull request? Revert SPARK-43049 to use Oracle Varchar (255) for string for performance consideration ### Why are the changes needed? for performance consideration ### Does this PR introduce _any_ user-facing change? yes, storing strings in Oracle table, which is defined by spark DDL with string columns. Users will get an error if string values exceed 255 ```java org.apache.spark.SparkRuntimeException: [EXCEED_LIMIT_LENGTH] Exceeds char/varchar type length limitation: 255. SQLSTATE: 54006 [info] at org.apache.spark.sql.errors.QueryExecutionErrors$.exceedMaxLimit(QueryExecutionErrors.scala:2512) ``` ### How was this patch tested? revised unit tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#44452 Closes apache#44442 from yaooqinn/SPARK-46478. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

github-actions bot added the SQL label Dec 21, 2023

sadikovi mentioned this pull request Dec 21, 2023

[SPARK-43049][SQL] Use CLOB instead of VARCHAR(255) for StringType for Oracle JDBC #40683

Closed

Revert "[SPARK-43049][SQL] Use CLOB instead of VARCHAR(255) for Strin…

2ee8573

…gType for Oracle JDBC" This reverts commit 529f2d5.

sadikovi force-pushed the SPARK-43049-revert branch from 84e63c2 to 2ee8573 Compare December 21, 2023 22:24

HyukjinKwon approved these changes Dec 21, 2023

View reviewed changes

HyukjinKwon mentioned this pull request Dec 22, 2023

[SPARK-46478][SQL] Revert SPARK-43049 to use oracle varchar(255) for string #44442

Closed

sadikovi closed this Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46482][SQL] Revert SPARK-43049 due to performance regression of using CLOB #44452

[SPARK-46482][SQL] Revert SPARK-43049 due to performance regression of using CLOB #44452

sadikovi commented Dec 21, 2023 •

edited by HyukjinKwon

Loading

sadikovi commented Dec 21, 2023

sadikovi commented Dec 21, 2023

[SPARK-46482][SQL] Revert SPARK-43049 due to performance regression of using CLOB #44452

[SPARK-46482][SQL] Revert SPARK-43049 due to performance regression of using CLOB #44452

Conversation

sadikovi commented Dec 21, 2023 • edited by HyukjinKwon Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

sadikovi commented Dec 21, 2023

sadikovi commented Dec 21, 2023

sadikovi commented Dec 21, 2023 •

edited by HyukjinKwon

Loading