Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor greatest and least Presto functions using simple function API #9308

Closed
wants to merge 16 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion velox/functions/prestosql/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ add_library(
FindFirst.cpp
FromUnixTime.cpp
FromUtf8.cpp
GreatestLeast.cpp
InPredicate.cpp
JsonFunctions.cpp
Map.cpp
Expand Down
207 changes: 0 additions & 207 deletions velox/functions/prestosql/GreatestLeast.cpp

This file was deleted.

100 changes: 100 additions & 0 deletions velox/functions/prestosql/GreatestLeast.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
/*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header is missing #pragma once.

I'll fix this, but wanted to let you know.

#include <functions/Macros.h>
mbasmanova marked this conversation as resolved.
Show resolved Hide resolved
#include <cmath>

namespace facebook::velox::functions {
namespace details {
/**
* This class implements two functions:
*
* greatest(value1, value2, ..., valueN) → [same as input]
* Returns the largest of the provided values.
*
* least(value1, value2, ..., valueN) → [same as input]
* Returns the smallest of the provided values.
*
* For DOUBLE and REAL type, NaN is considered as the biggest according to
* https://github.com/prestodb/presto/issues/22391
**/
template <typename TExec, typename T, bool isLeast>
struct ExtremeValueFunction {
VELOX_DEFINE_FUNCTION_TYPES(TExec);

FOLLY_ALWAYS_INLINE void call(
out_type<T>& result,
const arg_type<T>& firstElement,
const arg_type<Variadic<T>>& remainingElement) {
mbasmanova marked this conversation as resolved.
Show resolved Hide resolved
auto currentValue = firstElement;

for (auto element : remainingElement) {
auto candidateValue = element.value();

if constexpr (isLeast) {
if (smallerThan(candidateValue, currentValue)) {
currentValue = candidateValue;
}
} else {
if (greaterThan(candidateValue, currentValue)) {
currentValue = candidateValue;
}
}
}

result = currentValue;
}

private:
template <typename K>
bool greaterThan(const K& lhs, const K& rhs) const {
if constexpr (std::is_same_v<K, double> || std::is_same_v<K, float>) {
if (std::isnan(lhs)) {
return true;
}

if (std::isnan(rhs)) {
return false;
}
}

return lhs > rhs;
}

template <typename K>
bool smallerThan(const K& lhs, const K& rhs) const {
if constexpr (std::is_same_v<K, double> || std::is_same_v<K, float>) {
if (std::isnan(lhs)) {
return false;
}

if (std::isnan(rhs)) {
return true;
}
}

return lhs < rhs;
}
};
} // namespace details

template <typename TExec, typename T>
using LeastFunction = details::ExtremeValueFunction<TExec, T, true>;

template <typename TExec, typename T>
using GreatestFunction = details::ExtremeValueFunction<TExec, T, false>;

} // namespace facebook::velox::functions
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,36 @@
#include "velox/functions/Registerer.h"
#include "velox/functions/lib/IsNull.h"
#include "velox/functions/prestosql/Cardinality.h"
#include "velox/functions/prestosql/GreatestLeast.h"
#include "velox/functions/prestosql/InPredicate.h"

namespace facebook::velox::functions {

template <typename T>
inline void registerGreatestLeastFunction(const std::string& prefix) {
registerFunction<ParameterBinder<GreatestFunction, T>, T, T, Variadic<T>>(
{prefix + "greatest"});

registerFunction<ParameterBinder<LeastFunction, T>, T, T, Variadic<T>>(
{prefix + "least"});
}

inline void registerAllGreatestLeastFunctions(const std::string& prefix) {
registerGreatestLeastFunction<bool>(prefix);
registerGreatestLeastFunction<int8_t>(prefix);
registerGreatestLeastFunction<int16_t>(prefix);
registerGreatestLeastFunction<int32_t>(prefix);
registerGreatestLeastFunction<int64_t>(prefix);
registerGreatestLeastFunction<int128_t>(prefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed. int128_t is used for LONG_DECIMAL type, which you are registering on L44.

I'll fix this, but wanted to let you know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova
image
image
According to type definitions, it seems like ShortDecimal is using int64_t, and the ShortDecimal is also being registered. I am curious in this case do we need to remove registerGreatestLeastFunction<int64_t>(prefix); in addition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Real-Chen-Happy Function registry stores a mapping from function signature to function implementation. Function signature is defined using logical type, e.g. f(integer) is different from f(date) even though both integer and date types are backed by INTEGER. Hence, we need a register call for each signature we wish to support. Since we want to support least(bigint,...), we need to call registerFunction<int64_t>.

Now, there is only one type backed by int128_t, that is long decimal. It actually seems like a bug that ``registerFunction<int128_t>` call is allowed. I think it should not be allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova I am thinking registerFunction<int128_t> may be reserved for HUGEINT?

Also, it seems like we are using logical types (ShortDecimal ...), Physical types (Varchar), and C++ types (int8_t ...) in function registrations, which is a bit confusing. Is there any future plan where we can just use physical types and logical types for function registrations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually no HUGEINT type in Velox. There is TypeKind::HUGEINT though. Things could definitely be clearer. Too many things to fix / improve. Not enough time.

registerGreatestLeastFunction<float>(prefix);
registerGreatestLeastFunction<double>(prefix);
registerGreatestLeastFunction<Varchar>(prefix);
registerGreatestLeastFunction<LongDecimal<P1, S1>>(prefix);
registerGreatestLeastFunction<ShortDecimal<P1, S1>>(prefix);
registerGreatestLeastFunction<Date>(prefix);
registerGreatestLeastFunction<Timestamp>(prefix);
}

extern void registerSubscriptFunction(
const std::string& name,
bool enableCaching);
Expand Down Expand Up @@ -47,12 +74,10 @@ void registerGeneralFunctions(const std::string& prefix) {
VELOX_REGISTER_VECTOR_FUNCTION(udf_transform, prefix + "transform");
VELOX_REGISTER_VECTOR_FUNCTION(udf_reduce, prefix + "reduce");
VELOX_REGISTER_VECTOR_FUNCTION(udf_array_filter, prefix + "filter");

VELOX_REGISTER_VECTOR_FUNCTION(udf_least, prefix + "least");
VELOX_REGISTER_VECTOR_FUNCTION(udf_greatest, prefix + "greatest");

VELOX_REGISTER_VECTOR_FUNCTION(udf_typeof, prefix + "typeof");

registerAllGreatestLeastFunctions(prefix);

registerFunction<CardinalityFunction, int64_t, Array<Any>>(
{prefix + "cardinality"});
registerFunction<CardinalityFunction, int64_t, Map<Any, Any>>(
Expand Down
Loading