-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Coral-Hive] Extend VersionedSqlUserDefinedFunction to support version-specific function names #508
Conversation
This convention is brittle. Why is shading indicated by an underscore? Can we use something that utilizes patterns/templates or maybe suffixes? We will have to be explicit about how to resolve conflicts in case of multiple names/classes match. By the way, resolving conflicts could happen in the "underscore" based approach, so it is not unique to the more generic solution suggested above. One resolution strategy is to fail loading the view if there is a collision. We need a representation that does not open the door for too generic names like |
@wmoustafa The shading prefix resolution only happens for static registered UDFs in |
b61b0f3
to
b1ab580
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First high-level review.
@@ -26,7 +28,7 @@ public SourceOperatorMatchSqlCallTransformer(String sourceOpName, int numOperand | |||
|
|||
@Override | |||
protected boolean condition(SqlCall sqlCall) { | |||
return sourceOpName.equalsIgnoreCase(sqlCall.getOperator().getName()) | |||
return sourceOpName.equalsIgnoreCase(FunctionUtils.removeShadingPrefix(sqlCall.getOperator().getName())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not leak this shading business to a standard operator like this. Is there any way to handle this outside this operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to CoralRegistryOperatorRenameSqlCallTransformer
in coral-trino.
|
||
/** | ||
* Checks if the given class name has a shading prefix. | ||
* A class name is considered shaded if the prefix before the first dot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove references to shading
and use versioning/versioned
in the description and the code? Using shading terminology unnecessarily complicates the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, modified.
|
||
|
||
public class FunctionUtils { | ||
public static final String CORAL_VERSIONED_UDF_PREFIX = "coralversionedudf_(\\d+|x)_(\\d+|x)_(\\d+|x)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coral_udf_version_..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
/** | ||
* Generates a versioned function name based on the given function name and class name. | ||
* For example, if the function name is "myFunction" and the class name is "coralversionedudf_1_0_0.com.linkedin.MyClass", | ||
* the versioned function name will be "myFunction_1_0_0". If the class name is not shaded, such as "com.linkedin.MyClass", | ||
* the versioned function name will be "myFunction". | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is not this done at the view definition level? Sounds like we are adding too much use case specific code to the core Coral logic. I am guessing because doing it in the view will hurt Trino, but this is not a valid reason for adding custom code like this. FYI @phd3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also seeing the only use is in Spark (and the behavior is Spark specific), not sure if this fits in coral-common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to VersionedSqlUserDefinedFunction
.
@@ -66,15 +67,15 @@ protected SqlCall transform(SqlCall sqlCall) { | |||
throw new UnsupportedUDFException(operatorName); | |||
} | |||
final String viewDependentFunctionName = operator.getViewDependentFunctionName(); | |||
String versionedFunctionName = FunctionUtils.getVersionedFunctionName(viewDependentFunctionName, operatorName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we leverage VersionedSqlUserDefinedFunction
jointly with this new code, is there a way to encapsulate all this functionality in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a new method getVersionedViewDependentFunctionName
in VersionedSqlUserDefinedFunction
.
7d6a600
to
606473e
Compare
// we want to see class name in the exception message for coverage testing | ||
// so throw exception here | ||
throw new UnknownSqlFunctionException(funcClassName); | ||
} | ||
|
||
return dynamicResolvedFunctions; | ||
} else { | ||
if (isVersioningUDF(funcClassName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here should be:
- If it has the necessary properties (and hence Dali UDF)
- If it is on the format
[name]+_version
ordb_t_+[name]
, thenshort name = [name]
.function name = unversioned class name
long name = [name]+_version or db_t_+[name] or whatever original name is
- else a special case of the above
- If it is on the format
Ideally Dali views should have function-identifier
property that makes it possible to use this directly in the registry or in function name
here and not derive function name from the class name but the latter is acceptable in the interim, especially everything is encapsulated in this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in case short name = [name]
is not useful for Trino, we can have short name
depend on function name
through a hardcoded dependency. Intention of short name
is to be used in engines that cannot leverage versioning like Trino.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the code by adding a new method getTrinoFunctionName
to VersionedSqlUserDefinedFunction
, which returns the Trino function name as how we did before:
- Try to retrieve the Trino function name from the registered map
TRINO_FUNC_NAME_MAP
based on the UDF class name - If the UDF class name doesn't exist in the map, follow the current approach by converting the camel case class name to lowercase underscore function name.
Since we don't have a standard for version in view function name (we only have a standard for versioning prefix in class name), and view function name may not be related to the registered Trino function name, I would prefer to keep the current approach of generating Trino function name, which would also avoid regressions.
0c9ba56
to
5372271
Compare
2e851af
to
56e0e4d
Compare
* the short function name. | ||
*/ | ||
public String getShortFunctionName() { | ||
String unversionedClassName = getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add java doc about the nature of the function name (e.g., its format etc). You can say this is just a convention and other naming approaches are still valid as long as they identify the type inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -356,6 +356,14 @@ public void testDaliUDFCall() { | |||
assertEquals(RelOptUtil.toString(rel), expectedPlan); | |||
} | |||
|
|||
@Test | |||
public void testUDFWithVersioningClassName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify the unit test name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
@@ -44,6 +44,7 @@ | |||
* Class to resolve hive function names in SQL to Function. | |||
*/ | |||
public class HiveFunctionResolver { | |||
private static final String CORAL_VERSIONED_UDF_PREFIX = "coral_udf_version_(\\d+|x)_(\\d+|x)_(\\d+|x)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove CORAL_
since VERSIONED_UDF
is clear enough that it is related to VersionedUDF class?
Should use CLASS_NAME_PREFIX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
@@ -218,7 +222,7 @@ public void addDynamicFunctionToTheRegistry(String funcClassName, Function funct | |||
new SqlUserDefinedFunction(new SqlIdentifier(funcClassName, ZERO), | |||
new HiveGenericUDFReturnTypeInference(funcClassName, hiveTable.getDaliUdfDependencies()), null, | |||
createSqlOperandTypeChecker(numOfOperands), null, null), | |||
hiveTable.getDaliUdfDependencies(), functionName)); | |||
hiveTable.getDaliUdfDependencies(), functionName, funcClassName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point we deal with functionName
as a class name (e.g., String unversionedClassName = getName();
). Is there a way to stick to one terminology througout the whole code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to use functionName
to refer to the function name in view text, and funcClassName
to refer to the function class string.
public String getUDFClassName() { | ||
return udfClassName; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FunctionName vs UDF Name.. can we fix all terminology and unify them to minimize confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/functions/HiveFunctionResolver.java
Outdated
Show resolved
Hide resolved
coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/functions/HiveFunctionResolver.java
Outdated
Show resolved
Hide resolved
...rc/main/java/com/linkedin/coral/hive/hive2rel/functions/VersionedSqlUserDefinedFunction.java
Outdated
Show resolved
Hide resolved
...rc/main/java/com/linkedin/coral/hive/hive2rel/functions/VersionedSqlUserDefinedFunction.java
Outdated
Show resolved
Hide resolved
...rc/main/java/com/linkedin/coral/hive/hive2rel/functions/VersionedSqlUserDefinedFunction.java
Outdated
Show resolved
Hide resolved
coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/functions/HiveFunctionResolver.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @ljfgem for working on this. Looks great now. Could you please update the title and description to match the new approach/sematics? I do not "shading" is a thing anymore.
It is probably along the lines of Extend VersionedSqlUserDefinedFunction to support version-specific function names
.
Please make sure Trino tests pass both regression and server side.
@wmoustafa Thank you for your reviews and suggestions on this PR! Modified the title and description, also finished the required tests. |
What changes are proposed in this pull request, and why are they necessary?
This PR is to support versioning UDF resolution for one unversioned UDF static registration.
For example, user can register
com.linkedin.abc
once and then use variations with standard versioning prefix in Hive view definitions. A UDF class is considered versioned if the prefix before the first dot followscoral_udf_version_(\d+|x)_(\d+|x)_(\d+|x)
format.To achieve the goal, we made the following changes:
HiveFunctionResolver
to allow versioning UDF resolution while static registry only contains unversioned class name.functionClassName
andgetShortFunctionName()
toVersionedSqlUserDefinedFunction
.After successful resolution in coral-hive, coral-spark would be able to return the UDF with versioning class name (
functionClassName
) so that Spark is able to register the versioning UDF class well. And coral-trino will use registered short name or fall back to convert the camel case class name to lowercase underscore function name (getShortFunctionName()
).How was this patch tested?