Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48668][SQL] Support ALTER NAMESPACE ... UNSET PROPERTIES in v2 #47038

Closed
wants to merge 14 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jun 20, 2024

What changes were proposed in this pull request?

The pr aims to support ALTER NAMESPACE ... UNSET PROPERTIES in v2.

Why are the changes needed?

  • For table and view, we can add, update, or delete table's properties using the following SQL:
ALTER (TABLE | VIEW) ... SET TBLPROPERTIES ...
ALTER (TABLE | VIEW) ... UNSET TBLPROPERTIES (IF EXISTS)? ...

| ALTER (TABLE | VIEW) identifierReference
SET TBLPROPERTIES propertyList #setTableProperties
| ALTER (TABLE | VIEW) identifierReference
UNSET TBLPROPERTIES (IF EXISTS)? propertyList #unsetTableProperties

  • But at the SQL level, there is only SET syntax for namespace, not UNSET syntax
ALTER namespace ... SET (DBPROPERTIES | PROPERTIES) ...

| ALTER namespace identifierReference
SET (DBPROPERTIES | PROPERTIES) propertyList #setNamespaceProperties

  • In addition, the underlying SupportsNamespaces interface supports deleting properties. I propose adding SQL syntax to facilitate users to use SQL instead of relying solely on APIs to manipulate the properties of namespace
    /**
    * Apply a set of metadata changes to a namespace in the catalog.
    *
    * @param namespace a multi-part namespace
    * @param changes a collection of changes to apply to the namespace
    * @throws NoSuchNamespaceException If the namespace does not exist (optional)
    * @throws UnsupportedOperationException If namespace properties are not supported
    */
    void alterNamespace(
    String[] namespace,
    NamespaceChange... changes) throws NoSuchNamespaceException;

    static NamespaceChange removeProperty(String property) {
    return new RemoveProperty(property);
    }

Does this PR introduce any user-facing change?

Yes, end users can delete the properties of namespace through SQL.

How was this patch tested?

Add new UT.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jun 20, 2024
@panbingkun
Copy link
Contributor Author

panbingkun commented Jun 21, 2024

@panbingkun panbingkun marked this pull request as ready for review June 21, 2024 07:05
case class UnsetNamespaceProperties(
namespace: LogicalPlan,
propertyKeys: Seq[String],
ifExists: Boolean) extends UnaryCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably use RunnableCommand to simplify the v2 command implementation. We can add a UnaryRunnableCommand, and then do

case class UnsetNamespacePropertiesCommand(...) extends UnaryRunnableCommand {
  ...
  def run(...) {
    val ResolvedIdentifier(catalog, ident) = child
    ...
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me give it a try, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I have improved it in the way you suggested.

@@ -51,6 +51,7 @@ trait RunnableCommand extends Command {
}

trait LeafRunnableCommand extends RunnableCommand with LeafLike[LogicalPlan]
trait UnaryRunnableCommand extends RunnableCommand with UnaryLike[LogicalPlan]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because RunnableCommand is defined in the module sql/core, it seems that adding UnaryRunnableCommand here is more appropriate.

@@ -2659,12 +2659,12 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase with Compilat
}

def unsetNonExistentPropertiesError(
properties: Seq[String], table: TableIdentifier): Throwable = {
properties: Seq[String], nameParts: Seq[String]): Throwable = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse UNSET_NONEXISTENT_PROPERTIES

@@ -98,3 +83,20 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
}
}
}

object ResolveCatalogs {
def resolveNamespace(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract method resolveNamespace from class ResolveCatalogs to object ResolveCatalogs and reuse it in UnsetNamespacePropertiesCommand

@@ -4231,7 +4231,7 @@
},
"UNSET_NONEXISTENT_PROPERTIES" : {
"message" : [
"Attempted to unset non-existent properties [<properties>] in table <table>."
"Attempted to unset non-existent properties [<properties>] in relation <relationId>."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to reuse it in the namespace scene

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but namespace is not a relation... maybe just table or namespace <name>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, okay.

* ALTER (DATABASE|SCHEMA|NAMESPACE) ... UNSET (DBPROPERTIES|PROPERTIES) [IF EXISTS] ...;
* }}}
*/
case class UnsetNamespacePropertiesCommand(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to define UnsetNamespacePropertiesCommand in module sql/core because it inherits UnaryRunnableCommand extends RunnableCommand and cannot be accessed in the module sql/catalyst (sql/core depends on sql/catalyst)

@@ -1098,4 +1098,24 @@ class SparkSqlAstBuilder extends AstBuilder {

(ctx.LOCAL != null, finalStorage, Some(DDLUtils.HIVE_PROVIDER))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to put the parse process of UnsetNamespacePropertiesCommand here, because UnsetNamespacePropertiesCommand can only be defined within the module sql/core, and AstBuilder cannot access the class UnsetNamespacePropertiesCommand within the module

override def run(sparkSession: SparkSession): Seq[Row] = {
val ResolvedIdentifier(catalog, ident) = child
val ns = ResolveCatalogs.resolveNamespace(
catalog, ident.namespace.toSeq :+ ident.name, fetchMetadata = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we should leave this to the analyzer. We should put UnresolvedNamespace inside UnsetNamespacePropertiesCommand, instead of UnresolvedIdentifier

Copy link
Contributor Author

@panbingkun panbingkun Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me fix it.

Also, I found an issue during the migration of Unify v1 and v2 ALTER TABLE .. UNSET TBLPROPERTIES IF EXISTS tests.
The command (AlterTableExec) for ALTER TABLE ... UNSET TBLPROPERTIES ... in v2 seems to have ignored whether the IF EXISTS parameter.
I think we should fix it. WDYH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we should leave this to the analyzer. We should put UnresolvedNamespace inside UnsetNamespacePropertiesCommand, instead of UnresolvedIdentifier

Thanks, updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fix it. WDYH?

Yea we should fix, but in a separated PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me do it in another separate PR.

@@ -4411,6 +4411,11 @@
"<functionName> with AES-<mode> does not support initialization vectors (IVs)."
]
},
"ALTER_NAMESPACE_PROPERTY" : {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename SET_NAMESPACE_PROPERTY to ALTER_NAMESPACE_PROPERTY, We will reuse it in SparkSqlParser

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SQL, SET also means update, I think it's fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in the following two cases, it is UNSUPPORTED_FEATURE.SET_NAMESPACE_PROPERTY, and I am concerned that it may cause misunderstandings.

test reserved properties

sql(s"CREATE NAMESPACE $ns")
val sqlText = s"ALTER NAMESPACE $ns SET PROPERTIES ('$key'='dummyVal')"
sql(s"CREATE NAMESPACE $ns")
val sqlText = s"ALTER NAMESPACE $ns UNSET PROPERTIES ('$key')"

Okay, let me restore it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the corresponding errors for table properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding errors for table properties: UNSUPPORTED_FEATURE.SET_TABLE_PROPERTY, eg:

ALTER TABLE ... SET PROPERTIES ('$reservedKey'='...')"
ALTER TABLE ... UNSET PROPERTIES ('$reservedKey')"

sql(s"ALTER NAMESPACE $ns SET PROPERTIES ('$key'='dummyVal')")
}
assert(exception.getMessage.contains(s"$key is a reserved namespace property"))
val sqlText = s"ALTER NAMESPACE $ns SET PROPERTIES ('$key'='dummyVal')"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR,
Only let's use checkErrorMatchPVals to check for error-conditions

import org.apache.spark.sql.errors.QueryCompilationErrors

/**
* A command that ALTER NAMESPACE UNSET PROPERTIES command.
Copy link
Contributor

@cloud-fan cloud-fan Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the grammar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@github-actions github-actions bot added the DOCS label Jun 27, 2024
@@ -25,7 +25,7 @@ license: |
`DATABASE`, `SCHEMA` and `NAMESPACE` are interchangeable and one can be used in place of the others. An error message
is issued if the database is not found in the system.

### ALTER PROPERTIES
### ALTER SET PROPERTIES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the entire page is for ALTER DATABASE, I think SET PROPERTIES is a better section title.

@@ -43,6 +43,23 @@ ALTER { DATABASE | SCHEMA | NAMESPACE } database_name

Specifies the name of the database to be altered.

### ALTER UNSET PROPERTIES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add examples for this new syntax, in the ### Examples section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* **database_name**

Specifies the name of the database to be altered.

### ALTER LOCATION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

/**
* Create a [[UnsetNamespacePropertiesCommand]] command.
*
* For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For example:
* Expected format:

* For example:
* {{{
* ALTER (DATABASE|SCHEMA|NAMESPACE) database
* UNSET (DBPROPERTIES | PROPERTIES) ('comment', 'key');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* UNSET (DBPROPERTIES | PROPERTIES) ('comment', 'key');
* UNSET (DBPROPERTIES | PROPERTIES) ('key1', 'key2');

@panbingkun
Copy link
Contributor Author

@cloud-fan
All suggestions have been updated.
Thank you for your patient review!

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 3bf7de0 Jun 28, 2024
asl3 pushed a commit to asl3/spark that referenced this pull request Jul 1, 2024
### What changes were proposed in this pull request?
The pr aims to support `ALTER NAMESPACE ... UNSET PROPERTIES` in `v2`.

### Why are the changes needed?
- For `table` and `view`, we can `add`, `update`, or `delete` table's `properties` using the following SQL:
```
ALTER (TABLE | VIEW) ... SET TBLPROPERTIES ...
ALTER (TABLE | VIEW) ... UNSET TBLPROPERTIES (IF EXISTS)? ...
```
https://github.com/apache/spark/blob/3469ec6b41967b1b4c7b2549174ed0c199815977/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L148-L151

- But at the SQL level, there is only `SET` syntax for `namespace`, not `UNSET` syntax
```
ALTER namespace ... SET (DBPROPERTIES | PROPERTIES) ...
```
https://github.com/apache/spark/blob/3469ec6b41967b1b4c7b2549174ed0c199815977/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L106-L107

- In addition, the underlying `SupportsNamespaces` interface supports deleting properties. I propose adding SQL syntax to facilitate users to use SQL instead of relying solely on APIs to manipulate the properties of `namespace`
https://github.com/apache/spark/blob/3469ec6b41967b1b4c7b2549174ed0c199815977/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java#L127-L137
https://github.com/apache/spark/blob/3469ec6b41967b1b4c7b2549174ed0c199815977/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/NamespaceChange.java#L59-L61

### Does this PR introduce _any_ user-facing change?
Yes, end users can delete the properties of `namespace` through SQL.

### How was this patch tested?
Add new UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47038 from panbingkun/SPARK-48668.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
}
checkError(e,
errorClass = "SCHEMA_NOT_FOUND",
parameters = Map("schemaName" -> s"`$ns`"))
Copy link
Contributor

@cloud-fan cloud-fan Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed one thing: why does the error message omit the catalog name? For table not found, the error message contains the user-given table name in the SQL statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me investigate it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the user's perspective (when he uses multiple catalogs for federated queries and encounters some tables that do not exist), we should fix it, WDYT?

Copy link
Contributor Author

@panbingkun panbingkun Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we also need to fix similar scenarios (when table not exist) below together?

private def requireTableExists(name: TableIdentifier): Unit = {
if (!tableExists(name)) {
throw new NoSuchTableException(db = name.database.get, table = name.table)
}
}
private def requireTableNotExists(name: TableIdentifier): Unit = {
if (tableExists(name)) {
throw new TableAlreadyExistsException(db = name.database.get, table = name.table)
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Initial PR, just attach catalog name before the namespace. If this way is ok, I will continue to add table related as well.
#47276
Thanks!

cloud-fan pushed a commit that referenced this pull request Jul 11, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing #47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
biruktesf-db pushed a commit to biruktesf-db/spark that referenced this pull request Jul 11, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing apache#47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing apache#47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
The pr aims to support `ALTER NAMESPACE ... UNSET PROPERTIES` in `v2`.

### Why are the changes needed?
- For `table` and `view`, we can `add`, `update`, or `delete` table's `properties` using the following SQL:
```
ALTER (TABLE | VIEW) ... SET TBLPROPERTIES ...
ALTER (TABLE | VIEW) ... UNSET TBLPROPERTIES (IF EXISTS)? ...
```
https://github.com/apache/spark/blob/3469ec6b41967b1b4c7b2549174ed0c199815977/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L148-L151

- But at the SQL level, there is only `SET` syntax for `namespace`, not `UNSET` syntax
```
ALTER namespace ... SET (DBPROPERTIES | PROPERTIES) ...
```
https://github.com/apache/spark/blob/3469ec6b41967b1b4c7b2549174ed0c199815977/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L106-L107

- In addition, the underlying `SupportsNamespaces` interface supports deleting properties. I propose adding SQL syntax to facilitate users to use SQL instead of relying solely on APIs to manipulate the properties of `namespace`
https://github.com/apache/spark/blob/3469ec6b41967b1b4c7b2549174ed0c199815977/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java#L127-L137
https://github.com/apache/spark/blob/3469ec6b41967b1b4c7b2549174ed0c199815977/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/NamespaceChange.java#L59-L61

### Does this PR introduce _any_ user-facing change?
Yes, end users can delete the properties of `namespace` through SQL.

### How was this patch tested?
Add new UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47038 from panbingkun/SPARK-48668.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing apache#47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…pace` to `catalog.namespace`

### What changes were proposed in this pull request?
The pr aims to change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace`.

### Why are the changes needed?
As discussing apache#47038 (comment), we should provide more friendly and clear prompt error message.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47276 from panbingkun/db_with_catalog.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants