[SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions #37887

srielau · 2022-09-14T23:38:55Z

What changes were proposed in this pull request?

This PR introduces the following error classes:

PARTITIONS_ALREADY_EXIST
Cannot ADD or RENAME TO partition(s) in table because they already exist.
Choose a different name, drop the existing partition, or add the IF NOT EXISTS clause to tolerate a pre-existing partition
PARTITIONS_NOT_FOUND
The partition(s) cannot be found in table .
Verify the partition specification and table name.
To tolerate the error on drop use ALTER TABLE … DROP IF EXISTS PARTITION.
ROUTINE_ALREADY_EXISTS
Cannot create the function because it already exists.
Choose a different name, drop or replace the existing function, or add the IF NOT EXISTS clause to tolerate a pre-existing function
ROUTINE_NOT_FOUND
The function cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP FUNCTION IF EXISTS
SCHEMA_ALREADY_EXISTS
Cannot create schema because it already exists.
Choose a different name, drop the existing schema, or add the IF NOT EXISTS clause to tolerate pre-existing schema
SCHEMA_NOT_EMPTY
Cannot drop a schema because it contains objects.
Use DROP SCHEMA ... CASCADE to drop the schema and all its objects.
SCHEMA_NOT_FOUND
The schema cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS.
TABLE_OR_VIEW_ALREADY_EXISTS
Cannot create table or view because it already exists.
Choose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects
TABLE_OR_VIEW_NOT_FOUND
The table or view cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.
TEMP_TABLE_OR_VIEW_ALREADY_EXISTS
Cannot create the temporary view because it already exists.
Choose a different name, drop or replace the existing view, or add the IF NOT EXISTS clause to tolerate pre-existing views.

Also (for JDBC data sources):

INDEX_ALREADY_EXISTS
Cannot create the index because it already exists. .
INDEX_NOT_FOUND
Cannot find the index. .

Some background:

We use ROUTINE over FUNCTION to be future proof, if/when PROCEDUREs appear.
We coarsify around TABLE_OR_VIEW_NOT_FOUND and TABLE_OR_VIEW_ALREADY_EXISTS (getting rid of dedicated reason as RENAME TABLE, etc.
We combine PARTITION and PARTITIONS errors
I use SCHEMA religiously. A debate can be had whether/ho/when to return NAMESPACE

There is currently one failure caused by:

https://issues.apache.org/jira/browse/SPARK-40521
Hive based ALTER TABLE ADD PARTITION returns to many partitions in case of PARTITIONS_ALREADY_EXISTS.

Why are the changes needed?

We want to convert all error to use the error-class framework

Does this PR introduce any user-facing change?

Yes, we are moving away from "free txt" and consolidate errors is error-classes.json.
This hardens the QA and code allowing us to improve error messages without breaking changes

How was this patch tested?

Run existing QA suite

AmplabJenkins · 2022-09-17T03:22:51Z

Can one of the admins verify this patch?

core/src/main/resources/error/error-classes.json

core/src/test/scala/org/apache/spark/SparkFunSuite.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala

srielau · 2022-10-14T16:27:21Z

@MaxGekk @cloud-fan
Since the last review I have:

eliminated tolerance for not having context and added all the contexts
removed replaced template error codes
addressed minor comments
added namespace related error messages in error-classes.json

What I have NOT done and need help with:

One remaining failure where I don't know how to generate the correct fragment in a multimode test. I have reached out to @gengliangwang to assist
Pull checkErrorTableNotFound() out of SparkFunSuite. In fact, in a way, its now worse because of new overloading for context
Generate NAMESPACE error codes

cloud-fan · 2022-10-18T06:06:05Z

thanks, merging to master!

anchovYu · 2022-11-04T17:17:59Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala


 class IndexAlreadyExistsException(message: String, cause: Option[Throwable] = None)
-  extends AnalysisException(message, cause = cause)
+  extends AnalysisException(errorClass = "INDEX_NOT_FOUND",


Should this be INDEX_ALREADY_EXISTS?

This has been fixed in another PR as far as I remember. Please, check the master branch.

### Description Supports new error messages. In `SparkAdapter.get_columns_in_relation`, it checks the error message when the specified table or view doesn't exist: https://github.com/dbt-labs/dbt-spark/blob/c87b6b2c48bcefb0ce52cd64984d3129d6f14ea0/dbt/adapters/spark/impl.py#L223 but, Spark will change the error message in the future release (apache/spark#37887), which causes the function to raise the `dbt.exceptions.RuntimeException` instead of returning an empty list. The function should also check whether the error message contains `[TABLE_OR_VIEW_NOT_FOUND]` or not. This will be reverted once dbt-labs/dbt-spark#515 is resolved.

### What changes were proposed in this pull request? This PR introduces the following error classes: - PARTITIONS_ALREADY_EXIST Cannot ADD or RENAME TO partition(s) <partitionList> in table <tableName> because they already exist. Choose a different name, drop the existing partition, or add the IF NOT EXISTS clause to tolerate a pre-existing partition - PARTITIONS_NOT_FOUND The partition(s) <partitionList> cannot be found in table <tableName>. Verify the partition specification and table name. To tolerate the error on drop use ALTER TABLE … DROP IF EXISTS PARTITION. - ROUTINE_ALREADY_EXISTS Cannot create the function <routineName> because it already exists. Choose a different name, drop or replace the existing function, or add the IF NOT EXISTS clause to tolerate a pre-existing function - ROUTINE_NOT_FOUND The function <routineName> cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP FUNCTION IF EXISTS - SCHEMA_ALREADY_EXISTS Cannot create schema <schemaName> because it already exists. Choose a different name, drop the existing schema, or add the IF NOT EXISTS clause to tolerate pre-existing schema - SCHEMA_NOT_EMPTY Cannot drop a schema <schemaName> because it contains objects. Use DROP SCHEMA ... CASCADE to drop the schema and all its objects. - SCHEMA_NOT_FOUND The schema <schemaName> cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog. To tolerate the error on drop use DROP SCHEMA IF EXISTS. - TABLE_OR_VIEW_ALREADY_EXISTS Cannot create table or view <relationName> because it already exists. Choose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects - TABLE_OR_VIEW_NOT_FOUND The table or view <relationName> cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. - TEMP_TABLE_OR_VIEW_ALREADY_EXISTS Cannot create the temporary view <relationName> because it already exists. Choose a different name, drop or replace the existing view, or add the IF NOT EXISTS clause to tolerate pre-existing views. Also (for JDBC data sources): - INDEX_ALREADY_EXISTS Cannot create the index because it already exists. <message>. - INDEX_NOT_FOUND Cannot find the index. <message>. Some background: * We use ROUTINE over FUNCTION to be future proof, if/when PROCEDUREs appear. * We coarsify around TABLE_OR_VIEW_NOT_FOUND and TABLE_OR_VIEW_ALREADY_EXISTS (getting rid of dedicated reason as RENAME TABLE, etc. * We combine PARTITION and PARTITIONS errors * I use SCHEMA religiously. A debate can be had whether/ho/when to return NAMESPACE There is currently one failure caused by: https://issues.apache.org/jira/browse/SPARK-40521 Hive based ALTER TABLE ADD PARTITION returns to many partitions in case of PARTITIONS_ALREADY_EXISTS. ### Why are the changes needed? We want to convert all error to use the error-class framework ### Does this PR introduce _any_ user-facing change? Yes, we are moving away from "free txt" and consolidate errors is error-classes.json. This hardens the QA and code allowing us to improve error messages without breaking changes ### How was this patch tested? Run existing QA suite Closes apache#37887 from srielau/SPARK-40360-Convert-some-ddl-mesages. Authored-by: Serge Rielau <serge.rielau@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added CORE R SQL STRUCTURED STREAMING labels Sep 14, 2022

srielau marked this pull request as draft September 14, 2022 23:39

srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch from f863c6e to 5d19e8b Compare September 14, 2022 23:40

srielau changed the title ~~[SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions~~ [SPARK-40360] [WIP] ALREADY_EXISTS and NOT_FOUND exceptions Sep 15, 2022

srielau marked this pull request as ready for review September 15, 2022 01:49

srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch 8 times, most recently from f60a91a to 6017e6e Compare September 16, 2022 18:56

srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch 6 times, most recently from 37a838e to ecdf639 Compare September 20, 2022 18:30

srielau changed the title ~~[SPARK-40360] [WIP] ALREADY_EXISTS and NOT_FOUND exceptions~~ [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions Sep 21, 2022

srielau commented Sep 21, 2022

View reviewed changes

core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved

srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch from 974e83e to 573538e Compare September 22, 2022 16:50

cloud-fan reviewed Sep 26, 2022

View reviewed changes

core/src/test/scala/org/apache/spark/SparkFunSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Sep 26, 2022

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Sep 26, 2022

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala Outdated Show resolved Hide resolved

srielau mentioned this pull request Oct 1, 2022

[SPARK-40603][SQL] Throw the original error from catalog implementations #38039

Closed

srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch 4 times, most recently from a1ab9d6 to 0f84136 Compare October 14, 2022 06:26

[SPARK-40360] *_ALREADY_EXISTS and *_NOT_FOUND errors

07da9c8

srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch from 0f84136 to 07da9c8 Compare October 14, 2022 16:17

srielau requested review from MaxGekk and cloud-fan and removed request for MaxGekk and cloud-fan October 14, 2022 16:28

Fix SQLViewTestSuite

962689a

srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch from be3e31b to 962689a Compare October 15, 2022 03:04

cloud-fan approved these changes Oct 18, 2022

View reviewed changes

cloud-fan closed this in e7fbefe Oct 18, 2022

ulysses-you mentioned this pull request Oct 20, 2022

[TEST] Fix Spark nightly build apache/kyuubi#3671

Closed

9 tasks

anchovYu reviewed Nov 4, 2022

View reviewed changes

This was referenced Nov 14, 2022

[CT-1503] [Feature] Support new error messages in the future Spark. dbt-labs/dbt-spark#515

Closed

Support new error messages. databricks/dbt-databricks#226

Merged

ueshin mentioned this pull request Nov 16, 2022

Supports new error messages. dbt-labs/dbt-spark#520

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions #37887

[SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions #37887

srielau commented Sep 14, 2022 •

edited

Loading

AmplabJenkins commented Sep 17, 2022

srielau commented Oct 14, 2022 •

edited by cloud-fan

Loading

cloud-fan commented Oct 18, 2022

anchovYu Nov 4, 2022

MaxGekk Nov 4, 2022

[SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions #37887

[SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions #37887

Conversation

srielau commented Sep 14, 2022 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AmplabJenkins commented Sep 17, 2022

srielau commented Oct 14, 2022 • edited by cloud-fan Loading

cloud-fan commented Oct 18, 2022

anchovYu Nov 4, 2022

Choose a reason for hiding this comment

MaxGekk Nov 4, 2022

Choose a reason for hiding this comment

srielau commented Sep 14, 2022 •

edited

Loading

srielau commented Oct 14, 2022 •

edited by cloud-fan

Loading