Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions #37887

Closed

Conversation

srielau
Copy link
Contributor

@srielau srielau commented Sep 14, 2022

What changes were proposed in this pull request?

This PR introduces the following error classes:

  • PARTITIONS_ALREADY_EXIST
    Cannot ADD or RENAME TO partition(s) in table because they already exist.
    Choose a different name, drop the existing partition, or add the IF NOT EXISTS clause to tolerate a pre-existing partition

  • PARTITIONS_NOT_FOUND
    The partition(s) cannot be found in table .
    Verify the partition specification and table name.
    To tolerate the error on drop use ALTER TABLE … DROP IF EXISTS PARTITION.

  • ROUTINE_ALREADY_EXISTS
    Cannot create the function because it already exists.
    Choose a different name, drop or replace the existing function, or add the IF NOT EXISTS clause to tolerate a pre-existing function

  • ROUTINE_NOT_FOUND
    The function cannot be found. Verify the spelling and correctness of the schema and catalog.
    If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog.
    To tolerate the error on drop use DROP FUNCTION IF EXISTS

  • SCHEMA_ALREADY_EXISTS
    Cannot create schema because it already exists.
    Choose a different name, drop the existing schema, or add the IF NOT EXISTS clause to tolerate pre-existing schema

  • SCHEMA_NOT_EMPTY
    Cannot drop a schema because it contains objects.
    Use DROP SCHEMA ... CASCADE to drop the schema and all its objects.

  • SCHEMA_NOT_FOUND
    The schema cannot be found. Verify the spelling and correctness of the schema and catalog.
    If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
    To tolerate the error on drop use DROP SCHEMA IF EXISTS.

  • TABLE_OR_VIEW_ALREADY_EXISTS
    Cannot create table or view because it already exists.
    Choose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects

  • TABLE_OR_VIEW_NOT_FOUND
    The table or view cannot be found. Verify the spelling and correctness of the schema and catalog.
    If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
    To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.

  • TEMP_TABLE_OR_VIEW_ALREADY_EXISTS
    Cannot create the temporary view because it already exists.
    Choose a different name, drop or replace the existing view, or add the IF NOT EXISTS clause to tolerate pre-existing views.

Also (for JDBC data sources):

  • INDEX_ALREADY_EXISTS
    Cannot create the index because it already exists. .

  • INDEX_NOT_FOUND
    Cannot find the index. .

Some background:

  • We use ROUTINE over FUNCTION to be future proof, if/when PROCEDUREs appear.
  • We coarsify around TABLE_OR_VIEW_NOT_FOUND and TABLE_OR_VIEW_ALREADY_EXISTS (getting rid of dedicated reason as RENAME TABLE, etc.
  • We combine PARTITION and PARTITIONS errors
  • I use SCHEMA religiously. A debate can be had whether/ho/when to return NAMESPACE

There is currently one failure caused by:

https://issues.apache.org/jira/browse/SPARK-40521
Hive based ALTER TABLE ADD PARTITION returns to many partitions in case of PARTITIONS_ALREADY_EXISTS.

Why are the changes needed?

We want to convert all error to use the error-class framework

Does this PR introduce any user-facing change?

Yes, we are moving away from "free txt" and consolidate errors is error-classes.json.
This hardens the QA and code allowing us to improve error messages without breaking changes

How was this patch tested?

Run existing QA suite

@srielau srielau marked this pull request as draft September 14, 2022 23:39
@srielau srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch from f863c6e to 5d19e8b Compare September 14, 2022 23:40
@srielau srielau changed the title [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions [SPARK-40360] [WIP] ALREADY_EXISTS and NOT_FOUND exceptions Sep 15, 2022
@srielau srielau marked this pull request as ready for review September 15, 2022 01:49
@srielau srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch 8 times, most recently from f60a91a to 6017e6e Compare September 16, 2022 18:56
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srielau srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch 6 times, most recently from 37a838e to ecdf639 Compare September 20, 2022 18:30
@srielau srielau changed the title [SPARK-40360] [WIP] ALREADY_EXISTS and NOT_FOUND exceptions [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions Sep 21, 2022
@srielau srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch from 974e83e to 573538e Compare September 22, 2022 16:50
@srielau srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch 4 times, most recently from a1ab9d6 to 0f84136 Compare October 14, 2022 06:26
@srielau srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch from 0f84136 to 07da9c8 Compare October 14, 2022 16:17
@srielau
Copy link
Contributor Author

srielau commented Oct 14, 2022

@MaxGekk @cloud-fan
Since the last review I have:

  • eliminated tolerance for not having context and added all the contexts
  • removed replaced template error codes
  • addressed minor comments
  • added namespace related error messages in error-classes.json

What I have NOT done and need help with:

  • One remaining failure where I don't know how to generate the correct fragment in a multimode test. I have reached out to @gengliangwang to assist
  • Pull checkErrorTableNotFound() out of SparkFunSuite. In fact, in a way, its now worse because of new overloading for context
  • Generate NAMESPACE error codes

@srielau srielau requested review from MaxGekk and cloud-fan and removed request for MaxGekk and cloud-fan October 14, 2022 16:28
@srielau srielau force-pushed the SPARK-40360-Convert-some-ddl-mesages branch from be3e31b to 962689a Compare October 15, 2022 03:04
@cloud-fan
Copy link
Contributor

thanks, merging to master!


class IndexAlreadyExistsException(message: String, cause: Option[Throwable] = None)
extends AnalysisException(message, cause = cause)
extends AnalysisException(errorClass = "INDEX_NOT_FOUND",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be INDEX_ALREADY_EXISTS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed in another PR as far as I remember. Please, check the master branch.

ueshin added a commit to databricks/dbt-databricks that referenced this pull request Nov 15, 2022
### Description

Supports new error messages.

In `SparkAdapter.get_columns_in_relation`, it checks the error message when the specified table or view doesn't exist:

https://github.com/dbt-labs/dbt-spark/blob/c87b6b2c48bcefb0ce52cd64984d3129d6f14ea0/dbt/adapters/spark/impl.py#L223

but, Spark will change the error message in the future release (apache/spark#37887), which causes the function to raise the `dbt.exceptions.RuntimeException` instead of returning an empty list.

The function should also check whether the error message contains `[TABLE_OR_VIEW_NOT_FOUND]` or not.

This will be reverted once dbt-labs/dbt-spark#515 is resolved.
ueshin added a commit to databricks/dbt-databricks that referenced this pull request Nov 15, 2022
### Description

Supports new error messages.

In `SparkAdapter.get_columns_in_relation`, it checks the error message when the specified table or view doesn't exist:

https://github.com/dbt-labs/dbt-spark/blob/c87b6b2c48bcefb0ce52cd64984d3129d6f14ea0/dbt/adapters/spark/impl.py#L223

but, Spark will change the error message in the future release (apache/spark#37887), which causes the function to raise the `dbt.exceptions.RuntimeException` instead of returning an empty list.

The function should also check whether the error message contains `[TABLE_OR_VIEW_NOT_FOUND]` or not.

This will be reverted once dbt-labs/dbt-spark#515 is resolved.
ueshin added a commit to databricks/dbt-databricks that referenced this pull request Nov 15, 2022
### Description

Supports new error messages.

In `SparkAdapter.get_columns_in_relation`, it checks the error message when the specified table or view doesn't exist:

https://github.com/dbt-labs/dbt-spark/blob/c87b6b2c48bcefb0ce52cd64984d3129d6f14ea0/dbt/adapters/spark/impl.py#L223

but, Spark will change the error message in the future release (apache/spark#37887), which causes the function to raise the `dbt.exceptions.RuntimeException` instead of returning an empty list.

The function should also check whether the error message contains `[TABLE_OR_VIEW_NOT_FOUND]` or not.

This will be reverted once dbt-labs/dbt-spark#515 is resolved.
ueshin added a commit to databricks/dbt-databricks that referenced this pull request Nov 15, 2022
### Description

Supports new error messages.

In `SparkAdapter.get_columns_in_relation`, it checks the error message when the specified table or view doesn't exist:

https://github.com/dbt-labs/dbt-spark/blob/c87b6b2c48bcefb0ce52cd64984d3129d6f14ea0/dbt/adapters/spark/impl.py#L223

but, Spark will change the error message in the future release (apache/spark#37887), which causes the function to raise the `dbt.exceptions.RuntimeException` instead of returning an empty list.

The function should also check whether the error message contains `[TABLE_OR_VIEW_NOT_FOUND]` or not.

This will be reverted once dbt-labs/dbt-spark#515 is resolved.
ueshin added a commit to databricks/dbt-databricks that referenced this pull request Nov 15, 2022
### Description

Supports new error messages.

In `SparkAdapter.get_columns_in_relation`, it checks the error message when the specified table or view doesn't exist:

https://github.com/dbt-labs/dbt-spark/blob/c87b6b2c48bcefb0ce52cd64984d3129d6f14ea0/dbt/adapters/spark/impl.py#L223

but, Spark will change the error message in the future release (apache/spark#37887), which causes the function to raise the `dbt.exceptions.RuntimeException` instead of returning an empty list.

The function should also check whether the error message contains `[TABLE_OR_VIEW_NOT_FOUND]` or not.

This will be reverted once dbt-labs/dbt-spark#515 is resolved.
ueshin added a commit to databricks/dbt-databricks that referenced this pull request Nov 15, 2022
### Description

Supports new error messages.

In `SparkAdapter.get_columns_in_relation`, it checks the error message when the specified table or view doesn't exist:

https://github.com/dbt-labs/dbt-spark/blob/c87b6b2c48bcefb0ce52cd64984d3129d6f14ea0/dbt/adapters/spark/impl.py#L223

but, Spark will change the error message in the future release (apache/spark#37887), which causes the function to raise the `dbt.exceptions.RuntimeException` instead of returning an empty list.

The function should also check whether the error message contains `[TABLE_OR_VIEW_NOT_FOUND]` or not.

This will be reverted once dbt-labs/dbt-spark#515 is resolved.
ueshin added a commit to databricks/dbt-databricks that referenced this pull request Nov 15, 2022
### Description

Supports new error messages.

In `SparkAdapter.get_columns_in_relation`, it checks the error message when the specified table or view doesn't exist:

https://github.com/dbt-labs/dbt-spark/blob/c87b6b2c48bcefb0ce52cd64984d3129d6f14ea0/dbt/adapters/spark/impl.py#L223

but, Spark will change the error message in the future release (apache/spark#37887), which causes the function to raise the `dbt.exceptions.RuntimeException` instead of returning an empty list.

The function should also check whether the error message contains `[TABLE_OR_VIEW_NOT_FOUND]` or not.

This will be reverted once dbt-labs/dbt-spark#515 is resolved.
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

This PR introduces the following error classes:

- PARTITIONS_ALREADY_EXIST
  Cannot ADD or RENAME TO partition(s) <partitionList> in table <tableName> because they already exist.
  Choose a different name, drop the existing partition, or add the IF NOT EXISTS clause to tolerate a pre-existing partition

- PARTITIONS_NOT_FOUND
  The partition(s) <partitionList> cannot be found in table <tableName>.
  Verify the partition specification and table name.
  To tolerate the error on drop use ALTER TABLE … DROP IF EXISTS PARTITION.

-  ROUTINE_ALREADY_EXISTS
  Cannot create the function <routineName> because it already exists.
  Choose a different name, drop or replace the existing function, or add the IF NOT EXISTS clause to tolerate a pre-existing function

- ROUTINE_NOT_FOUND
  The function <routineName> cannot be found. Verify the spelling and correctness of the schema and catalog.
  If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog.
  To tolerate the error on drop use DROP FUNCTION IF EXISTS

- SCHEMA_ALREADY_EXISTS
  Cannot create schema <schemaName> because it already exists.
  Choose a different name, drop the existing schema, or add the IF NOT EXISTS clause to tolerate pre-existing schema

- SCHEMA_NOT_EMPTY
  Cannot drop a schema <schemaName> because it contains objects.
  Use DROP SCHEMA ... CASCADE to drop the schema and all its objects.

- SCHEMA_NOT_FOUND
  The schema <schemaName> cannot be found. Verify the spelling and correctness of the schema and catalog.
  If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
  To tolerate the error on drop use DROP SCHEMA IF EXISTS.

- TABLE_OR_VIEW_ALREADY_EXISTS
  Cannot create table or view <relationName> because it already exists.
  Choose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects

- TABLE_OR_VIEW_NOT_FOUND
  The table or view <relationName> cannot be found. Verify the spelling and correctness of the schema and catalog.
  If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
  To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.

- TEMP_TABLE_OR_VIEW_ALREADY_EXISTS
  Cannot create the temporary view <relationName> because it already exists.
  Choose a different name, drop or replace the existing view,  or add the IF NOT EXISTS clause to tolerate pre-existing views.

Also (for JDBC data sources):

- INDEX_ALREADY_EXISTS
  Cannot create the index because it already exists. <message>.

- INDEX_NOT_FOUND
  Cannot find the index. <message>.

Some background:
* We use ROUTINE over FUNCTION to be future proof, if/when PROCEDUREs appear.
* We coarsify around TABLE_OR_VIEW_NOT_FOUND and TABLE_OR_VIEW_ALREADY_EXISTS (getting rid of dedicated reason as RENAME TABLE, etc.
* We combine PARTITION and PARTITIONS errors
* I use SCHEMA religiously. A debate can be had whether/ho/when to return NAMESPACE

There is currently one failure caused by:

https://issues.apache.org/jira/browse/SPARK-40521
Hive based ALTER TABLE ADD PARTITION returns to many partitions in case of PARTITIONS_ALREADY_EXISTS.

### Why are the changes needed?
We want to convert all error to use the error-class framework

### Does this PR introduce _any_ user-facing change?

Yes, we are moving away from "free txt" and consolidate errors is error-classes.json.
This hardens the QA and code allowing us to improve error messages without breaking changes

### How was this patch tested?

Run existing QA suite

Closes apache#37887 from srielau/SPARK-40360-Convert-some-ddl-mesages.

Authored-by: Serge Rielau <serge.rielau@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants