Refactor greatest and least Presto functions using simple function API #9308

Real-Chen-Happy · 2024-03-29T20:45:17Z

Refactor the greatest/least functions using simple function API.

Also, add support for NaN comparisons for DOUBLE and REAL. NaN is the biggest according to prestodb/presto#22391

facebook-github-bot · 2024-03-29T20:45:23Z

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

netlify · 2024-03-29T20:45:33Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`7f76957`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/6610402589bcf10008dc4320

facebook-github-bot · 2024-03-29T22:05:30Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Real-Chen-Happy · 2024-03-30T00:17:55Z

Aware of the decimal compatibility errors. Fix in progress

Real-Chen-Happy · 2024-03-30T07:19:08Z

The error happens in one of the circleci check
The previous signature is least(DECIMAL(precision,scale)...) -> DECIMAL(precision,scale) and greatest(DECIMAL(precision,scale)...) -> DECIMAL(precision,scale)
But current signature changed to least(decimal(i1,i5)...) -> decimal(i1,i5) and greatest(decimal(i1,i5)...) -> decimal(i1,i5)
I just investigated the errors, and I am not sure what's going wrong.
I also check the previous PR #9096, and it seems like the error also happens there.
@mbasmanova May I know if the error can be ignored? If not, do you have any recommended way for fixing it? Thanks

mbasmanova · 2024-04-01T11:31:01Z

@Real-Chen-Happy The error you are seeing is a limitation of the CI check: #9240

CC @kgpai

mbasmanova

@Real-Chen-Happy Thank you for the refactoring. Looks great overall. Some comments.

In Presto, greatest and least functions allow array and struct inputs as well. It would be nice to add support for these in a follow-up.

velox/functions/prestosql/GreatestLeast.h

mbasmanova · 2024-04-01T11:44:18Z

velox/functions/prestosql/GreatestLeast.h

+
+namespace facebook::velox::functions {
+
+template <typename TExec, typename TInput, bool isLeast>


TInput -> T for readability

Put ExtremeValueFunction into facebook::velox::functions::details namespace to signal that it shouldn't be used outside of this file.

mbasmanova · 2024-04-01T11:44:57Z

velox/functions/prestosql/GreatestLeast.h

+struct ExtremeValueFunction {
+  VELOX_DEFINE_FUNCTION_TYPES(TExec);
+
+  // For double, presto should throw error if input is Nan


Comments should be full sentences. Start with a capital letter and end with a period. Please, fix throughout the PR.

mbasmanova · 2024-04-01T11:45:48Z

velox/functions/prestosql/GreatestLeast.h

+  // For double, presto should throw error if input is Nan
+  template <typename T>
+  void checkNan(const T& value) const {
+    if constexpr (std::is_same_v<T, TypeTraits<TypeKind::DOUBLE>::NativeType>) {


This behavior is quite surprising. One would expect this functionality to apply to both DOUBLE and REAL types, not only to DOUBLE. It might be helpful to open an issue in PrestoDB repo to ask whether this logic is intentional.

Replace TypeTraits<TypeKind::DOUBLE>::NativeType with double for readability.

Issue created
prestodb/presto#22391

@mbasmanova Based on their reply, could we safely remove the NaN checks?

@Real-Chen-Happy Yes, we can remove NaN checks, but then we need to make sure that NaN is considered larger than any non-NaN value.

mbasmanova · 2024-04-01T11:47:10Z

velox/functions/prestosql/GreatestLeast.h

+    }
+  }
+
+  // expect all input to be not null, else the result is null


This comment is redundant. Let's remove.

callNullFree doesn't do anything when input types are not complex. Use 'call'.

This function should return 'void' because it should never return null for non-null inputs.

mbasmanova · 2024-04-01T11:47:57Z

velox/functions/prestosql/GreatestLeast.h

+      out_type<TInput>& result,
+      const null_free_arg_type<Variadic<TInput>>& inputs) {
+    // ensure that input size is greater than 0
+    if (inputs.size() == 0) {


This should be an assert as the signature of the function should not allow no inputs. To fix that you need to change the registration code:

registerFunction<ParameterBinder<GreatestFunction, T>, T, T, Variadic<T>>(...)

mbasmanova · 2024-04-01T11:49:25Z

velox/functions/prestosql/registration/GeneralFunctionsRegistration.cpp

@@ -48,8 +62,33 @@ void registerGeneralFunctions(const std::string& prefix) {
  VELOX_REGISTER_VECTOR_FUNCTION(udf_reduce, prefix + "reduce");
  VELOX_REGISTER_VECTOR_FUNCTION(udf_array_filter, prefix + "filter");

-  VELOX_REGISTER_VECTOR_FUNCTION(udf_least, prefix + "least");
-  VELOX_REGISTER_VECTOR_FUNCTION(udf_greatest, prefix + "greatest");
+  registerGreatestFunction<bool>(prefix);


This is a lot of code. Perhaps, replace registerGreatestFunction with registerGreatestAndLeastFunction to reduce the size of this code in half.

Real-Chen-Happy · 2024-04-01T17:14:38Z

@Real-Chen-Happy Thank you for the refactoring. Looks great overall. Some comments.

In Presto, greatest and least functions allow array and struct inputs as well. It would be nice to add support for these in a follow-up.

Sure! For clarification, do you mean adding these support in this PR or in a separate PR?

mbasmanova · 2024-04-01T17:21:14Z

do you mean adding these support in this PR or in a separate PR?

Separate PR would be better, I think.

Real-Chen-Happy · 2024-04-05T05:51:44Z

@mbasmanova Could you help review the updated code again? For the new version, I make the following changes:

Refactor the tests

Thank you!

mbasmanova

@Real-Chen-Happy Looks great % a couple nits.

mbasmanova · 2024-04-05T09:55:22Z

velox/functions/prestosql/tests/GreatestLeastTest.cpp

-  ASSERT_TRUE(result.has_value());
-  ASSERT_TRUE(result.value());
+TEST_F(GreatestLeastTest, greatestNanInput) {
+  auto greatestFloatTestThreeArgs = [&](float a, float b, float c) {


nit: perhaps, shorten to greatestFloat and greatestDouble for readability

mbasmanova · 2024-04-05T09:55:51Z

velox/functions/prestosql/tests/GreatestLeastTest.cpp

+}
+
+TEST_F(GreatestLeastTest, leastNanInput) {
+  auto leastFloatTestThreeArgs = [&](float a, float b, float c) {


mbasmanova · 2024-04-05T09:57:36Z

velox/functions/prestosql/tests/GreatestLeastTest.cpp

+  EXPECT_EQ(leastFloatTestThreeArgs(1.0, std::nanf("1"), 0.5), 0.5);
+  EXPECT_EQ(
+      leastFloatTestThreeArgs(
+          std::nanf("1"), 1.0, -std::numeric_limits<float>::infinity()),


nit: use kNegativeInf32 and kNegativeInf64 (or similar) constants for readability

mbasmanova · 2024-04-05T09:58:34Z

velox/functions/prestosql/registration/GeneralFunctionsRegistration.cpp

@@ -48,8 +59,19 @@ void registerGeneralFunctions(const std::string& prefix) {
  VELOX_REGISTER_VECTOR_FUNCTION(udf_reduce, prefix + "reduce");
  VELOX_REGISTER_VECTOR_FUNCTION(udf_array_filter, prefix + "filter");

-  VELOX_REGISTER_VECTOR_FUNCTION(udf_least, prefix + "least");
-  VELOX_REGISTER_VECTOR_FUNCTION(udf_greatest, prefix + "greatest");
+  registerGreatestLeastFunction<bool>(prefix);


nit: perhaps, extract this code into registerGreatestLeastFunctions(prefix) helper function; otherwise, as more functions are added in the same manner, this method will become very large.

facebook-github-bot · 2024-04-05T10:00:45Z

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Real-Chen-Happy · 2024-04-05T17:20:44Z

@mbasmanova Just updated the code to fix the nit issues. Thank you for helping review!

Also, may I know how to get invited to Velox Slack channel? I sent my request a week ago to velox@meta.com, but still not hear back anything yet. Let me know if there is anything I need to do prior to joining. Thanks!

mbasmanova · 2024-04-05T17:28:57Z

velox/functions/prestosql/registration/GeneralFunctionsRegistration.cpp

 #include "velox/functions/prestosql/InPredicate.h"

 namespace facebook::velox::functions {
+
+template <typename T>
+inline void registerGreatestLeastFunctionHelper(const std::string& prefix) {


We don't use names like xxxHelper. Any chance you could rename this function to registerGreatestLeastFunction and the other one to ___s (registerGreatestLeastFunctions)? Or something along these lines. Thanks.

How about registerGreatestLeastFunction and registerAllGreatestLeastFunctions ?

That works. Thanks.

facebook-github-bot · 2024-04-05T17:30:07Z

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

sanumandla · 2024-04-05T17:38:35Z

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Thank you for the reminder @mbasmanova . @Real-Chen-Happy I just sent you an invite. Can you check if you can access slack now?

Real-Chen-Happy · 2024-04-05T17:49:40Z

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Thank you for the reminder @mbasmanova . @Real-Chen-Happy I just sent you an invite. Can you check if you can access slack now?

Yes I have the access now! Thank you so much! @sanumandla

Real-Chen-Happy · 2024-04-05T18:18:09Z

@mbasmanova Just updated the code to rename the registration function. Thank you again for helping review!

facebook-github-bot · 2024-04-05T18:22:44Z

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mbasmanova · 2024-04-08T10:53:27Z

velox/functions/prestosql/GreatestLeast.h

+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+


This header is missing #pragma once.

I'll fix this, but wanted to let you know.

mbasmanova · 2024-04-08T10:54:05Z

velox/functions/prestosql/registration/GeneralFunctionsRegistration.cpp

+  registerGreatestLeastFunction<int16_t>(prefix);
+  registerGreatestLeastFunction<int32_t>(prefix);
+  registerGreatestLeastFunction<int64_t>(prefix);
+  registerGreatestLeastFunction<int128_t>(prefix);


This should be removed. int128_t is used for LONG_DECIMAL type, which you are registering on L44.

I'll fix this, but wanted to let you know.

@mbasmanova

According to type definitions, it seems like ShortDecimal is using int64_t, and the ShortDecimal is also being registered. I am curious in this case do we need to remove registerGreatestLeastFunction<int64_t>(prefix); in addition?

@Real-Chen-Happy Function registry stores a mapping from function signature to function implementation. Function signature is defined using logical type, e.g. f(integer) is different from f(date) even though both integer and date types are backed by INTEGER. Hence, we need a register call for each signature we wish to support. Since we want to support least(bigint,...), we need to call registerFunction<int64_t>.

Now, there is only one type backed by int128_t, that is long decimal. It actually seems like a bug that ``registerFunction<int128_t>` call is allowed. I think it should not be allowed.

@mbasmanova I am thinking registerFunction<int128_t> may be reserved for HUGEINT?

Also, it seems like we are using logical types (ShortDecimal ...), Physical types (Varchar), and C++ types (int8_t ...) in function registrations, which is a bit confusing. Is there any future plan where we can just use physical types and logical types for function registrations?

There is actually no HUGEINT type in Velox. There is TypeKind::HUGEINT though. Things could definitely be clearer. Too many things to fix / improve. Not enough time.

mbasmanova

@Real-Chen-Happy I'm seeing errors internally when trying to land this change. I'll work on resolving this, but there might be a delay.

facebook-github-bot · 2024-04-08T15:52:28Z

@mbasmanova merged this pull request in e29cde7.

conbench-facebook · 2024-04-08T16:17:56Z

Conbench analyzed the 1 benchmark run on commit e29cde7b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Real-Chen-Happy · 2024-04-08T17:49:56Z

@mbasmanova Thank you again for helping review and fix errors. I am very excited to submit my first commit to Velox!

mbasmanova · 2024-04-08T17:58:32Z

@Real-Chen-Happy Thank you for the contribution.

facebook-github-bot · 2024-04-16T19:41:43Z

This pull request has been reverted by fd5643a.

…nctions using simple function API"" Summary: Re-introducing this as the issue that initiated the backout is resolved. Original PR: facebookincubator#9308 Differential Revision: D56548695

…nctions using simple function API"" Summary: Re-introducing this as the issue that initiated the backout is resolved. Original PR: #9308 Reviewed By: mbasmanova, s4ayub Differential Revision: D56548695 fbshipit-source-id: d0a9032f5cc958c8f4a3124c1ad81f290e31800b

facebookincubator#9308) Summary: Refactor the greatest/least functions using simple function API. Also, add support for NaN comparisons for DOUBLE and REAL. NaN is the biggest according to prestodb/presto#22391 Fixes facebookincubator#3728 Pull Request resolved: facebookincubator#9308 Reviewed By: xiaoxmeng Differential Revision: D55793910 Pulled By: mbasmanova fbshipit-source-id: c389bad91197f00ced549d816a15efab5a2dd910

…nctions using simple function API"" Summary: Re-introducing this as the issue that initiated the backout is resolved. Original PR: facebookincubator#9308 Reviewed By: mbasmanova, s4ayub Differential Revision: D56548695 fbshipit-source-id: d0a9032f5cc958c8f4a3124c1ad81f290e31800b

Real-Chen-Happy added 5 commits March 29, 2024 02:29

Refactor greatest/least functions using simple API

f8998f9

Add support for Decimal Type and fix compilation issues

4222947

Fix the comments issues

8bfa725

Remove the unused include

b285c80

Refactor the checkNan as generic function

f5187fe

Real-Chen-Happy mentioned this pull request Mar 29, 2024

Rewrite Presto function greatest/least using simple function interface #3728

Closed

Return null if input size is 0

aff75bb

Real-Chen-Happy closed this Mar 29, 2024

Real-Chen-Happy reopened this Mar 29, 2024

Real-Chen-Happy changed the title ~~Refactor greatest/least functions using simple API~~ Refactor presto greatest/least functions using simple API Mar 29, 2024

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 29, 2024

Real-Chen-Happy changed the title ~~Refactor presto greatest/least functions using simple API~~ Refactor presto greatest/least functions using simple function API Mar 29, 2024

Real-Chen-Happy changed the title ~~Refactor presto greatest/least functions using simple function API~~ [WIP] Refactor presto greatest/least functions using simple function API Mar 30, 2024

Real-Chen-Happy changed the title ~~[WIP] Refactor presto greatest/least functions using simple function API~~ Refactor presto greatest/least functions using simple function API Mar 30, 2024

mbasmanova changed the title ~~Refactor presto greatest/least functions using simple function API~~ Refactor greatest and least Presto functions using simple function API Apr 1, 2024

mbasmanova reviewed Apr 1, 2024

View reviewed changes

Real-Chen-Happy force-pushed the main branch from 431c69f to aff75bb Compare April 2, 2024 06:02

Real-Chen-Happy added 5 commits April 1, 2024 23:16

Refactoring code based on the code reviews

1b257c5

Support comparison for NaN for DOUBLE and REAL data types

d030968

Remove unused includes

2d2d55a

Move extreme function to details scope

74d8652

Adding comma to the comments

8c8a8ad

Using std::nan for double

3903abb

mbasmanova approved these changes Apr 5, 2024

View reviewed changes

Fix nit issues in code review

c4f9751

mbasmanova reviewed Apr 5, 2024

View reviewed changes

Refactor register function name

7f76957

mbasmanova reviewed Apr 8, 2024

View reviewed changes

facebook-github-bot closed this in e29cde7 Apr 8, 2024

facebook-github-bot added the Merged label Apr 8, 2024

facebook-github-bot added the Reverted label Apr 16, 2024

bikramSingh91 mentioned this pull request Apr 24, 2024

Back out "Back out "[velox][PR] Refactor greatest and least Presto functions using simple function API"" #9613

Closed


		namespace facebook::velox::functions {

		template <typename TExec, typename TInput, bool isLeast>

Refactor greatest and least Presto functions using simple function API #9308

Refactor greatest and least Presto functions using simple function API #9308

Conversation

Real-Chen-Happy commented Mar 29, 2024 • edited by mbasmanova Loading

facebook-github-bot commented Mar 29, 2024

Action Required

Process

netlify bot commented Mar 29, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

facebook-github-bot commented Mar 29, 2024

Real-Chen-Happy commented Mar 30, 2024 • edited Loading

Real-Chen-Happy commented Mar 30, 2024 • edited Loading

mbasmanova commented Apr 1, 2024

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Real-Chen-Happy Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Real-Chen-Happy commented Apr 1, 2024

mbasmanova commented Apr 1, 2024

Real-Chen-Happy commented Apr 5, 2024

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 5, 2024

Real-Chen-Happy commented Apr 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 5, 2024

sanumandla commented Apr 5, 2024

Real-Chen-Happy commented Apr 5, 2024

Real-Chen-Happy commented Apr 5, 2024 • edited Loading

facebook-github-bot commented Apr 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 8, 2024

conbench-facebook bot commented Apr 8, 2024

Real-Chen-Happy commented Apr 8, 2024

mbasmanova commented Apr 8, 2024

facebook-github-bot commented Apr 16, 2024

Real-Chen-Happy commented Mar 29, 2024 •

edited by mbasmanova

Loading

netlify bot commented Mar 29, 2024 •

edited

Loading

Real-Chen-Happy commented Mar 30, 2024 •

edited

Loading

Real-Chen-Happy commented Mar 30, 2024 •

edited

Loading

Real-Chen-Happy Apr 2, 2024 •

edited

Loading

Real-Chen-Happy commented Apr 5, 2024 •

edited

Loading