-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor greatest and least Presto functions using simple function API #9308
Conversation
Hi @Real-Chen-Happy! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
✅ Deploy Preview for meta-velox canceled.
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Aware of the decimal compatibility errors. Fix in progress |
The error happens in one of the circleci check |
@Real-Chen-Happy The error you are seeing is a limitation of the CI check: #9240 CC @kgpai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Real-Chen-Happy Thank you for the refactoring. Looks great overall. Some comments.
In Presto, greatest and least functions allow array and struct inputs as well. It would be nice to add support for these in a follow-up.
|
||
namespace facebook::velox::functions { | ||
|
||
template <typename TExec, typename TInput, bool isLeast> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TInput -> T for readability
Put ExtremeValueFunction into facebook::velox::functions::details namespace to signal that it shouldn't be used outside of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
struct ExtremeValueFunction { | ||
VELOX_DEFINE_FUNCTION_TYPES(TExec); | ||
|
||
// For double, presto should throw error if input is Nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should be full sentences. Start with a capital letter and end with a period. Please, fix throughout the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// For double, presto should throw error if input is Nan | ||
template <typename T> | ||
void checkNan(const T& value) const { | ||
if constexpr (std::is_same_v<T, TypeTraits<TypeKind::DOUBLE>::NativeType>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is quite surprising. One would expect this functionality to apply to both DOUBLE and REAL types, not only to DOUBLE. It might be helpful to open an issue in PrestoDB repo to ask whether this logic is intentional.
Replace TypeTraits<TypeKind::DOUBLE>::NativeType
with double
for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created
prestodb/presto#22391
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Based on their reply, could we safely remove the NaN checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Real-Chen-Happy Yes, we can remove NaN checks, but then we need to make sure that NaN is considered larger than any non-NaN value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} | ||
} | ||
|
||
// expect all input to be not null, else the result is null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is redundant. Let's remove.
callNullFree doesn't do anything when input types are not complex. Use 'call'.
This function should return 'void' because it should never return null for non-null inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
out_type<TInput>& result, | ||
const null_free_arg_type<Variadic<TInput>>& inputs) { | ||
// ensure that input size is greater than 0 | ||
if (inputs.size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an assert as the signature of the function should not allow no inputs. To fix that you need to change the registration code:
registerFunction<ParameterBinder<GreatestFunction, T>, T, T, Variadic<T>>(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -48,8 +62,33 @@ void registerGeneralFunctions(const std::string& prefix) { | |||
VELOX_REGISTER_VECTOR_FUNCTION(udf_reduce, prefix + "reduce"); | |||
VELOX_REGISTER_VECTOR_FUNCTION(udf_array_filter, prefix + "filter"); | |||
|
|||
VELOX_REGISTER_VECTOR_FUNCTION(udf_least, prefix + "least"); | |||
VELOX_REGISTER_VECTOR_FUNCTION(udf_greatest, prefix + "greatest"); | |||
registerGreatestFunction<bool>(prefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of code. Perhaps, replace registerGreatestFunction with registerGreatestAndLeastFunction to reduce the size of this code in half.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Sure! For clarification, do you mean adding these support in this PR or in a separate PR? |
Separate PR would be better, I think. |
@mbasmanova Could you help review the updated code again? For the new version, I make the following changes:
Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Real-Chen-Happy Looks great % a couple nits.
ASSERT_TRUE(result.has_value()); | ||
ASSERT_TRUE(result.value()); | ||
TEST_F(GreatestLeastTest, greatestNanInput) { | ||
auto greatestFloatTestThreeArgs = [&](float a, float b, float c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, shorten to greatestFloat and greatestDouble for readability
} | ||
|
||
TEST_F(GreatestLeastTest, leastNanInput) { | ||
auto leastFloatTestThreeArgs = [&](float a, float b, float c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
EXPECT_EQ(leastFloatTestThreeArgs(1.0, std::nanf("1"), 0.5), 0.5); | ||
EXPECT_EQ( | ||
leastFloatTestThreeArgs( | ||
std::nanf("1"), 1.0, -std::numeric_limits<float>::infinity()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use kNegativeInf32 and kNegativeInf64 (or similar) constants for readability
@@ -48,8 +59,19 @@ void registerGeneralFunctions(const std::string& prefix) { | |||
VELOX_REGISTER_VECTOR_FUNCTION(udf_reduce, prefix + "reduce"); | |||
VELOX_REGISTER_VECTOR_FUNCTION(udf_array_filter, prefix + "filter"); | |||
|
|||
VELOX_REGISTER_VECTOR_FUNCTION(udf_least, prefix + "least"); | |||
VELOX_REGISTER_VECTOR_FUNCTION(udf_greatest, prefix + "greatest"); | |||
registerGreatestLeastFunction<bool>(prefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, extract this code into registerGreatestLeastFunctions(prefix) helper function; otherwise, as more functions are added in the same manner, this method will become very large.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova Just updated the code to fix the nit issues. Thank you for helping review! Also, may I know how to get invited to Velox Slack channel? I sent my request a week ago to velox@meta.com, but still not hear back anything yet. Let me know if there is anything I need to do prior to joining. Thanks! |
#include "velox/functions/prestosql/InPredicate.h" | ||
|
||
namespace facebook::velox::functions { | ||
|
||
template <typename T> | ||
inline void registerGreatestLeastFunctionHelper(const std::string& prefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use names like xxxHelper. Any chance you could rename this function to registerGreatestLeastFunction and the other one to ___s (registerGreatestLeastFunctions)? Or something along these lines. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about registerGreatestLeastFunction and registerAllGreatestLeastFunctions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Thank you for the reminder @mbasmanova . @Real-Chen-Happy I just sent you an invite. Can you check if you can access slack now? |
Yes I have the access now! Thank you so much! @sanumandla |
@mbasmanova Just updated the code to rename the registration function. Thank you again for helping review! |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header is missing #pragma once.
I'll fix this, but wanted to let you know.
registerGreatestLeastFunction<int16_t>(prefix); | ||
registerGreatestLeastFunction<int32_t>(prefix); | ||
registerGreatestLeastFunction<int64_t>(prefix); | ||
registerGreatestLeastFunction<int128_t>(prefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed. int128_t is used for LONG_DECIMAL type, which you are registering on L44.
I'll fix this, but wanted to let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova
According to type definitions, it seems like ShortDecimal is using int64_t, and the ShortDecimal is also being registered. I am curious in this case do we need to remove registerGreatestLeastFunction<int64_t>(prefix);
in addition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Real-Chen-Happy Function registry stores a mapping from function signature to function implementation. Function signature is defined using logical type, e.g. f(integer) is different from f(date) even though both integer and date types are backed by INTEGER. Hence, we need a register call for each signature we wish to support. Since we want to support least(bigint,...), we need to call registerFunction<int64_t>
.
Now, there is only one type backed by int128_t, that is long decimal. It actually seems like a bug that ``registerFunction<int128_t>` call is allowed. I think it should not be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova I am thinking registerFunction<int128_t>
may be reserved for HUGEINT?
Also, it seems like we are using logical types (ShortDecimal ...), Physical types (Varchar), and C++ types (int8_t ...) in function registrations, which is a bit confusing. Is there any future plan where we can just use physical types and logical types for function registrations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually no HUGEINT type in Velox. There is TypeKind::HUGEINT though. Things could definitely be clearer. Too many things to fix / improve. Not enough time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Real-Chen-Happy I'm seeing errors internally when trying to land this change. I'll work on resolving this, but there might be a delay.
@mbasmanova merged this pull request in e29cde7. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
@mbasmanova Thank you again for helping review and fix errors. I am very excited to submit my first commit to Velox! |
@Real-Chen-Happy Thank you for the contribution. |
This pull request has been reverted by fd5643a. |
…nctions using simple function API"" Summary: Re-introducing this as the issue that initiated the backout is resolved. Original PR: facebookincubator#9308 Differential Revision: D56548695
…nctions using simple function API"" Summary: Re-introducing this as the issue that initiated the backout is resolved. Original PR: #9308 Reviewed By: mbasmanova, s4ayub Differential Revision: D56548695 fbshipit-source-id: d0a9032f5cc958c8f4a3124c1ad81f290e31800b
facebookincubator#9308) Summary: Refactor the greatest/least functions using simple function API. Also, add support for NaN comparisons for DOUBLE and REAL. NaN is the biggest according to prestodb/presto#22391 Fixes facebookincubator#3728 Pull Request resolved: facebookincubator#9308 Reviewed By: xiaoxmeng Differential Revision: D55793910 Pulled By: mbasmanova fbshipit-source-id: c389bad91197f00ced549d816a15efab5a2dd910
…nctions using simple function API"" Summary: Re-introducing this as the issue that initiated the backout is resolved. Original PR: facebookincubator#9308 Reviewed By: mbasmanova, s4ayub Differential Revision: D56548695 fbshipit-source-id: d0a9032f5cc958c8f4a3124c1ad81f290e31800b
Refactor the greatest/least functions using simple function API.
Also, add support for NaN comparisons for DOUBLE and REAL. NaN is the biggest according to prestodb/presto#22391
Fixes #3728