-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49611][SQL] Introduce TVF collations()
& remove the SHOW COLLATIONS
command
#48087
Changes from 2 commits
48c614e
089c860
98d41a2
5ba5cfe
317f105
5009a4f
52d7709
cbe8e08
1141c15
fbb1d84
abd58c0
72ff1b9
ec6c084
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -18,6 +18,7 @@ | |||||||||
package org.apache.spark.sql.catalyst.expressions | ||||||||||
|
||||||||||
import scala.collection.mutable | ||||||||||
import scala.jdk.CollectionConverters.CollectionHasAsScala | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why this import? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||
|
||||||||||
import org.apache.spark.sql.Row | ||||||||||
import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} | ||||||||||
|
@@ -28,7 +29,7 @@ import org.apache.spark.sql.catalyst.expressions.codegen._ | |||||||||
import org.apache.spark.sql.catalyst.expressions.codegen.Block._ | ||||||||||
import org.apache.spark.sql.catalyst.plans.logical.{FunctionSignature, InputParameter} | ||||||||||
import org.apache.spark.sql.catalyst.trees.TreePattern.{GENERATOR, TreePattern} | ||||||||||
import org.apache.spark.sql.catalyst.util.{ArrayData, MapData} | ||||||||||
import org.apache.spark.sql.catalyst.util.{ArrayData, CollationFactory, MapData} | ||||||||||
import org.apache.spark.sql.catalyst.util.SQLKeywordUtils._ | ||||||||||
import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryExecutionErrors} | ||||||||||
import org.apache.spark.sql.internal.SQLConf | ||||||||||
|
@@ -618,3 +619,45 @@ case class SQLKeywords() extends LeafExpression with Generator with CodegenFallb | |||||||||
|
||||||||||
override def prettyName: String = "sql_keywords" | ||||||||||
} | ||||||||||
|
||||||||||
@ExpressionDescription( | ||||||||||
usage = """_FUNC_() - Get Spark SQL all collations""", | ||||||||||
panbingkun marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
examples = """ | ||||||||||
Examples: | ||||||||||
> SELECT * FROM _FUNC_() LIMIT 2; | ||||||||||
SYSTEM BUILTIN UTF8_BINARY NULL NULL ACCENT_SENSITIVE CASE_SENSITIVE NO_PAD NULL | ||||||||||
SYSTEM BUILTIN UTF8_LCASE NULL NULL ACCENT_SENSITIVE CASE_INSENSITIVE NO_PAD NULL | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this output deterministic? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be, listCollationMeta enforces UTF8_* collations first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would replace this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated. |
||||||||||
""", | ||||||||||
since = "4.0.0", | ||||||||||
group = "generator_funcs") | ||||||||||
case class AllCollations() extends LeafExpression with Generator with CodegenFallback { | ||||||||||
override def elementSchema: StructType = new StructType() | ||||||||||
.add("COLLATION_CATALOG", StringType, nullable = false) | ||||||||||
.add("COLLATION_SCHEMA", StringType, nullable = false) | ||||||||||
.add("COLLATION_NAME", StringType, nullable = false) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we really need the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I noticed an issue. Do we need to register it as a spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala Lines 370 to 371 in 9fc58aa
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala Lines 1155 to 1156 in 9fc58aa
so we can use it as follows: SELECT all_collations(); or SELECT * FROM all_collations(); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using generator functions in SELECT is not recommended. We only keep them for backward compatibility. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you, I see. |
||||||||||
.add("LANGUAGE", StringType) | ||||||||||
.add("COUNTRY", StringType) | ||||||||||
.add("ACCENT_SENSITIVITY", StringType, nullable = false) | ||||||||||
.add("CASE_SENSITIVITY", StringType, nullable = false) | ||||||||||
cloud-fan marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
.add("PAD_ATTRIBUTE", StringType, nullable = false) | ||||||||||
.add("ICU_VERSION", StringType) | ||||||||||
|
||||||||||
override def eval(input: InternalRow): IterableOnce[InternalRow] = { | ||||||||||
CollationFactory.listCollations().asScala.map(CollationFactory.loadCollationMeta).map { m => | ||||||||||
InternalRow( | ||||||||||
UTF8String.fromString(m.catalog), | ||||||||||
UTF8String.fromString(m.schema), | ||||||||||
UTF8String.fromString(m.collationName), | ||||||||||
if (m.language != null) UTF8String.fromString(m.language) else null, | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
public static UTF8String fromString(String str) {
return str == null ? null : fromBytes(str.getBytes(StandardCharsets.UTF_8));
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated. |
||||||||||
if (m.country != null) UTF8String.fromString(m.country) else null, | ||||||||||
UTF8String.fromString( | ||||||||||
if (m.accentSensitivity) "ACCENT_SENSITIVE" else "ACCENT_INSENSITIVE"), | ||||||||||
UTF8String.fromString( | ||||||||||
if (m.caseSensitivity) "CASE_SENSITIVE" else "CASE_INSENSITIVE"), | ||||||||||
UTF8String.fromString(m.padAttribute), | ||||||||||
if (m.icuVersion != null) UTF8String.fromString(m.icuVersion) else null) | ||||||||||
} | ||||||||||
} | ||||||||||
|
||||||||||
override def prettyName: String = "all_collations" | ||||||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given we have
sql_keywords
, shall we call itstring_collations
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: To me this sounds weird. We suggest that there might exist collations for some other type, for keywords it is more likely to have other keywords (python, scala ...) @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, not only string type? then
all_collations
is goodThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, Let's restore to
all_collations
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_collation is fine although I'm not sure why all is necessary. But no strong feelings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's
all_collation
, notall_collations
, right?