Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add companionFunction to function metadata #9250

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pramodsatya
Copy link
Collaborator

@pramodsatya pramodsatya commented Mar 26, 2024

Adds a boolean field, isCompanionFunction, to VectorFunctionMetadata,
AggregateFunctionMetadata, and WindowFunction::Metadata, to indicate
whether the respective scalar, aggregate, and window functions are companion
functions in Velox.
This field would be used to check for and exclude companion functions from
the function metadata returned by the v1/functions endpoint in the Presto
C++ sidecar. Currently, this is being done by searching for specific suffixes in
the registered companion functions' names.
Related discussion: #11011 .

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2024
Copy link

netlify bot commented Mar 26, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 6b91785
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6758124066ee6c0008099ea3

velox/functions/CoverageUtil.h Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.h Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
velox/functions/CoverageUtil.h Outdated Show resolved Hide resolved
velox/functions/CoverageUtil.h Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
@aditi-pandit aditi-pandit changed the title [WIP] Support retrieving function metadata [WIP] Add FunctionRegistry APIs to retrieve function metadata. Mar 26, 2024
@aditi-pandit
Copy link
Collaborator

aditi-pandit commented Mar 26, 2024

@pramodsatya : How do we take care that we get only Prestissimo functions from the registry ? If there is a conflict between a Prestissimo and Spark function with the same name, then how do we disambiguate ?

@pramodsatya
Copy link
Collaborator Author

pramodsatya commented Mar 27, 2024

Thanks for the feedback @aditi-pandit, @czentgr. We are not checking that only Prestissimo functions are retrieved from the registry. Instead we rely on the Prestissimo worker to have registered only the presto functions, such as in this function, so there are no Spark functions in the registry. Please let me know if this is fine or whether we should have an additional way to distinguish between Presto and Spark functions in the registry.

@pramodsatya pramodsatya force-pushed the get_fn_metadata branch 3 times, most recently from c2011b1 to 8a28ec9 Compare March 29, 2024 02:49
@pramodsatya pramodsatya changed the title [WIP] Add FunctionRegistry APIs to retrieve function metadata. Add FunctionRegistry APIs to retrieve function metadata. Jun 7, 2024
@pramodsatya
Copy link
Collaborator Author

Hi @aditi-pandit, @czentgr, could you please take another look at this PR?

velox/functions/CMakeLists.txt Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @aditi-pandit, addressed the comments. Could you please take another look?

velox/functions/CMakeLists.txt Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
@pramodsatya pramodsatya marked this pull request as ready for review August 2, 2024 00:31
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.h Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.h Outdated Show resolved Hide resolved
velox/functions/tests/FunctionRegistryTest.cpp Outdated Show resolved Hide resolved
@pramodsatya pramodsatya changed the title Add FunctionRegistry APIs to retrieve function metadata. Expose helper functions to retrieve registered functions from FunctionRegistry Aug 3, 2024
Copy link
Collaborator Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aditi-pandit, addressed the comments. Could you please take another look?

velox/functions/CoverageUtil.cpp Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. Changes look good.

@Yuhta Yuhta requested a review from mbasmanova August 15, 2024 16:10
velox/functions/FunctionRegistry.h Outdated Show resolved Hide resolved
@pramodsatya pramodsatya force-pushed the get_fn_metadata branch 2 times, most recently from 8bdded7 to 2b18863 Compare August 27, 2024 03:08
@pramodsatya pramodsatya changed the title Expose helper functions to retrieve registered functions from FunctionRegistry Modify FunctionRegistry APIs to retrieve function metadata for coverage map Aug 27, 2024
velox/exec/Aggregate.cpp Outdated Show resolved Hide resolved
velox/exec/WindowFunction.cpp Outdated Show resolved Hide resolved
velox/functions/CoverageUtil.cpp Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.cpp Outdated Show resolved Hide resolved
velox/functions/FunctionRegistry.h Outdated Show resolved Hide resolved
@pramodsatya
Copy link
Collaborator Author

Thanks for the feedback @aditi-pandit. @czentgr suggested that because this API modification is centered around detecting and removing companion functions, it would be better to include information about whether a function is a companion function in it's metadata. Since the concept of companion functions is specific to velox and not prestissimo, it would be better to get this information from the velox function metadata and detect companion functions from the function metadata in prestissimo (and handle them as per the requirement in prestissimo). This also helps avoid a dependency on velox_exec in velox_function_registry, and detect companion functions in a cleaner manner instead of gathering this information by finding specific substrings in the registered function name.

I have updated the PR accordingly to include companion function information in the function metadata (in VectorFunctionMetadata for vector functions, in AggregateFunctionMetadata for aggregates, and in WindowFunction::Metadata for window functions). Could you please help review the updated changes and share if this approach is fine, @aditi-pandit @mbasmanova ?

@pramodsatya pramodsatya changed the title Modify FunctionRegistry APIs to retrieve function metadata for coverage map Add isCompanionFunction to function metadata Aug 27, 2024
Copy link
Collaborator

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good now.

@@ -413,10 +423,11 @@ bool CompanionFunctionsRegistrar::registerMergeExtractFunction(

auto mergeExtractFunctionName =
CompanionSignatures::mergeExtractFunctionName(name);
return registerAggregateFunction(
return registerMergeExtractFunctionImpl(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to registerMergeExtractFunctionInternal.

@aditi-pandit
Copy link
Collaborator

@pramodsatya. @czentgr : This is a reasonable solution as well, if we are okay exposing companion function concept to the services using Velox.

With the other API to return all functions with a boolean parameter to say 'skipInternalFunctions' a single function could be used to skip all companion or any other internal functions added by the Velox framework.

@mbasmanova : wdyt ?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya Pramod, would you update PR description to explain this change?

@pramodsatya
Copy link
Collaborator Author

@pramodsatya Pramod, would you update PR description to explain this change?

Apologies @mbasmanova, since this PR went through many iterations I had left it empty until the related discussion was concluded. I have updated the description now, could you please take another look?

@mbasmanova
Copy link
Contributor

@pramodsatya

whether the respective scalar, aggregate, and window functions are companion functions

Can a window function be a companion function? I assume not. If so, let's remove isCompanion function from window function metadata.

@@ -59,6 +62,11 @@ class VectorFunctionMetadataBuilder {
return *this;
}

VectorFunctionMetadataBuilder& isCompanionFunction(bool isCompanionFunction) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: drop 'is';

builder.companionFunction(true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: drop 'is';

if (auto func = getAggregateFunctionEntry(sanitizedName)) {
return func->metadata;
} else {
VELOX_USER_FAIL("Metadata not found for aggregate function: {}", name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that aggregate function doesn't exist, not that it is missing metadata. Let's clarify.

VELOX_USER_FAIL("Aggregate function not found: {}", name);

const auto sanitizedName = sanitizeName(name);
if (auto func = getAggregateFunctionEntry(sanitizedName)) {
return func->metadata;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop else after return

@@ -43,14 +43,15 @@ class WindowFunction {
kRows,
};

/// Indicates whether this is an aggregate window function and its process
/// unit.
/// Indicates whether this is an aggregate window function, whether it is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

...whether the corresponding aggregate function is a companion function....

struct Metadata {
ProcessMode processMode;
bool isAggregate;
bool isCompanionFunction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, clarify that this can be true iff isAggregate is true.

@@ -40,6 +40,9 @@ struct VectorFunctionMetadata {
/// In this case, 'rows' in VectorFunction::apply will point only to positions
/// for which all arguments are not null.
bool defaultNullBehavior{true};

/// Indicates if this is a companion function.
bool isCompanionFunction{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency: drop 'is' to math other booleans here

@@ -476,6 +476,9 @@ struct AggregateFunctionMetadata {
/// True if results of the aggregation depend on the order of inputs. For
/// example, array_agg is order sensitive while count is not.
bool orderSensitive{true};

/// Indicates if this is a companion function.
bool isCompanionFunction{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency: drop 'is' to math the other boolean

@@ -75,7 +75,7 @@ void registerRowNumber(const std::string& name, TypeKind resultTypeKind) {
exec::registerWindowFunction(
name,
std::move(signatures),
{exec::WindowFunction::ProcessMode::kRows, false},
{exec::WindowFunction::ProcessMode::kRows, false, false},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem right to allow specifying isCompanion for a window function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted, thanks.

@@ -142,7 +142,7 @@ void registerAverageAggregate(
}
}
},
{false /*orderSensitive*/},
{false /*orderSensitive*/, false /*isCompanionFunction*/},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange to have an API that has both isCompanion and withCompanionFunctions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it seems a bit strange, but isCompanionFunction is a part of the function metadata which indicates if the function being registered currently is a companion function, whereas withCompanionFunctions indicates if the aggregate should be registered with companion functions. So both these fields would be needed.
Could you please share how this could be made more readable?

@@ -356,6 +357,23 @@ TEST_F(FunctionRegistryTest, isDeterministic) {
ASSERT_FALSE(isDeterministic("not_found_function").has_value());
}

TEST_F(FunctionRegistryTest, isCompanionFunction) {
functions::prestosql::registerAllScalarFunctions();
// extract aggregate companion functions are registered as vector functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos

Copy link
Collaborator Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @mbasmanova. The companion function metadata added for window functions is now reverted and a check is added in registerAggregateFunction to ensure aggregate companion functions are no longer registered as window functions.
Could you please take another look?

@@ -75,7 +75,7 @@ void registerRowNumber(const std::string& name, TypeKind resultTypeKind) {
exec::registerWindowFunction(
name,
std::move(signatures),
{exec::WindowFunction::ProcessMode::kRows, false},
{exec::WindowFunction::ProcessMode::kRows, false, false},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted, thanks.

@@ -142,7 +142,7 @@ void registerAverageAggregate(
}
}
},
{false /*orderSensitive*/},
{false /*orderSensitive*/, false /*isCompanionFunction*/},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it seems a bit strange, but isCompanionFunction is a part of the function metadata which indicates if the function being registered currently is a companion function, whereas withCompanionFunctions indicates if the aggregate should be registered with companion functions. So both these fields would be needed.
Could you please share how this could be made more readable?

@pramodsatya pramodsatya changed the title Add isCompanionFunction to function metadata feat: Add companionFunction to function metadata Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants