Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46482][SQL] Revert SPARK-43049 due to performance regression of using CLOB #44452

Closed
wants to merge 1 commit into from

Conversation

sadikovi
Copy link
Contributor

@sadikovi sadikovi commented Dec 21, 2023

What changes were proposed in this pull request?

The PR reverts 529f2d5 (PR #40683) due to a performance regression when writing to an Oracle database due to columns are not being created as CLOB instead of VARCHAR2.

Performance was confirmed by internal benchmark of writing 20 string fields to an Oracle database. Previously the SQL statement would finish within 2 min, with the master branch it takes over 10 min (and I cancel the job, it could be running for longer).

Why are the changes needed?

Fixes performance regression.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A. This is a revert of an existing change.

Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44442

@sadikovi
Copy link
Contributor Author

We might need to revert this in branch-3.5, the patch was also merged there.

@sadikovi
Copy link
Contributor Author

cc @cloud-fan @yaooqinn

@sadikovi sadikovi closed this Dec 22, 2023
dongjoon-hyun pushed a commit that referenced this pull request Dec 24, 2023
…string

### What changes were proposed in this pull request?

Revert SPARK-43049 to use Oracle Varchar (255) for string for performance consideration

### Why are the changes needed?

for performance consideration

### Does this PR introduce _any_ user-facing change?

yes, storing strings in Oracle table, which is defined by spark DDL with string columns. Users will get an error if string values exceed 255

```java
org.apache.spark.SparkRuntimeException: [EXCEED_LIMIT_LENGTH] Exceeds char/varchar type length limitation: 255. SQLSTATE: 54006
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.exceedMaxLimit(QueryExecutionErrors.scala:2512)
```

### How was this patch tested?

revised unit tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #44452

Closes #44442 from yaooqinn/SPARK-46478.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
yaooqinn added a commit to yaooqinn/spark that referenced this pull request Dec 26, 2023
…string

### What changes were proposed in this pull request?

Revert SPARK-43049 to use Oracle Varchar (255) for string for performance consideration

### Why are the changes needed?

for performance consideration

### Does this PR introduce _any_ user-facing change?

yes, storing strings in Oracle table, which is defined by spark DDL with string columns. Users will get an error if string values exceed 255

```java
org.apache.spark.SparkRuntimeException: [EXCEED_LIMIT_LENGTH] Exceeds char/varchar type length limitation: 255. SQLSTATE: 54006
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.exceedMaxLimit(QueryExecutionErrors.scala:2512)
```

### How was this patch tested?

revised unit tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#44452

Closes apache#44442 from yaooqinn/SPARK-46478.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants