Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor greatest and least Presto functions using simple function API #9308

Closed
wants to merge 16 commits into from

Conversation

Real-Chen-Happy
Copy link
Contributor

@Real-Chen-Happy Real-Chen-Happy commented Mar 29, 2024

Refactor the greatest/least functions using simple function API.

Also, add support for NaN comparisons for DOUBLE and REAL. NaN is the biggest according to prestodb/presto#22391

Fixes #3728

@facebook-github-bot
Copy link
Contributor

Hi @Real-Chen-Happy!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Copy link

netlify bot commented Mar 29, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 7f76957
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6610402589bcf10008dc4320

@Real-Chen-Happy Real-Chen-Happy changed the title Refactor greatest/least functions using simple API Refactor presto greatest/least functions using simple API Mar 29, 2024
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 29, 2024
@Real-Chen-Happy Real-Chen-Happy changed the title Refactor presto greatest/least functions using simple API Refactor presto greatest/least functions using simple function API Mar 29, 2024
@Real-Chen-Happy
Copy link
Contributor Author

Real-Chen-Happy commented Mar 30, 2024

Aware of the decimal compatibility errors. Fix in progress

@Real-Chen-Happy Real-Chen-Happy changed the title Refactor presto greatest/least functions using simple function API [WIP] Refactor presto greatest/least functions using simple function API Mar 30, 2024
@Real-Chen-Happy
Copy link
Contributor Author

Real-Chen-Happy commented Mar 30, 2024

The error happens in one of the circleci check
The previous signature is least(DECIMAL(precision,scale)...) -> DECIMAL(precision,scale) and greatest(DECIMAL(precision,scale)...) -> DECIMAL(precision,scale)
But current signature changed to least(decimal(i1,i5)...) -> decimal(i1,i5) and greatest(decimal(i1,i5)...) -> decimal(i1,i5)
I just investigated the errors, and I am not sure what's going wrong.
I also check the previous PR #9096, and it seems like the error also happens there.
@mbasmanova May I know if the error can be ignored? If not, do you have any recommended way for fixing it? Thanks

@Real-Chen-Happy Real-Chen-Happy changed the title [WIP] Refactor presto greatest/least functions using simple function API Refactor presto greatest/least functions using simple function API Mar 30, 2024
@mbasmanova
Copy link
Contributor

@Real-Chen-Happy The error you are seeing is a limitation of the CI check: #9240

CC @kgpai

@mbasmanova mbasmanova changed the title Refactor presto greatest/least functions using simple function API Refactor greatest and least Presto functions using simple function API Apr 1, 2024
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Real-Chen-Happy Thank you for the refactoring. Looks great overall. Some comments.

In Presto, greatest and least functions allow array and struct inputs as well. It would be nice to add support for these in a follow-up.

velox/functions/prestosql/GreatestLeast.h Show resolved Hide resolved

namespace facebook::velox::functions {

template <typename TExec, typename TInput, bool isLeast>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TInput -> T for readability

Put ExtremeValueFunction into facebook::velox::functions::details namespace to signal that it shouldn't be used outside of this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

struct ExtremeValueFunction {
VELOX_DEFINE_FUNCTION_TYPES(TExec);

// For double, presto should throw error if input is Nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be full sentences. Start with a capital letter and end with a period. Please, fix throughout the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// For double, presto should throw error if input is Nan
template <typename T>
void checkNan(const T& value) const {
if constexpr (std::is_same_v<T, TypeTraits<TypeKind::DOUBLE>::NativeType>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is quite surprising. One would expect this functionality to apply to both DOUBLE and REAL types, not only to DOUBLE. It might be helpful to open an issue in PrestoDB repo to ask whether this logic is intentional.

Replace TypeTraits<TypeKind::DOUBLE>::NativeType with double for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created
prestodb/presto#22391

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova Based on their reply, could we safely remove the NaN checks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Real-Chen-Happy Yes, we can remove NaN checks, but then we need to make sure that NaN is considered larger than any non-NaN value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}

// expect all input to be not null, else the result is null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is redundant. Let's remove.

callNullFree doesn't do anything when input types are not complex. Use 'call'.

This function should return 'void' because it should never return null for non-null inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

out_type<TInput>& result,
const null_free_arg_type<Variadic<TInput>>& inputs) {
// ensure that input size is greater than 0
if (inputs.size() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an assert as the signature of the function should not allow no inputs. To fix that you need to change the registration code:

registerFunction<ParameterBinder<GreatestFunction, T>, T, T, Variadic<T>>(...)

Copy link
Contributor Author

@Real-Chen-Happy Real-Chen-Happy Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -48,8 +62,33 @@ void registerGeneralFunctions(const std::string& prefix) {
VELOX_REGISTER_VECTOR_FUNCTION(udf_reduce, prefix + "reduce");
VELOX_REGISTER_VECTOR_FUNCTION(udf_array_filter, prefix + "filter");

VELOX_REGISTER_VECTOR_FUNCTION(udf_least, prefix + "least");
VELOX_REGISTER_VECTOR_FUNCTION(udf_greatest, prefix + "greatest");
registerGreatestFunction<bool>(prefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of code. Perhaps, replace registerGreatestFunction with registerGreatestAndLeastFunction to reduce the size of this code in half.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@Real-Chen-Happy
Copy link
Contributor Author

@Real-Chen-Happy Thank you for the refactoring. Looks great overall. Some comments.

In Presto, greatest and least functions allow array and struct inputs as well. It would be nice to add support for these in a follow-up.

Sure! For clarification, do you mean adding these support in this PR or in a separate PR?

@mbasmanova
Copy link
Contributor

do you mean adding these support in this PR or in a separate PR?

Separate PR would be better, I think.

@Real-Chen-Happy
Copy link
Contributor Author

@mbasmanova Could you help review the updated code again? For the new version, I make the following changes:

  1. Refactor the tests

Thank you!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Real-Chen-Happy Looks great % a couple nits.

ASSERT_TRUE(result.has_value());
ASSERT_TRUE(result.value());
TEST_F(GreatestLeastTest, greatestNanInput) {
auto greatestFloatTestThreeArgs = [&](float a, float b, float c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, shorten to greatestFloat and greatestDouble for readability

}

TEST_F(GreatestLeastTest, leastNanInput) {
auto leastFloatTestThreeArgs = [&](float a, float b, float c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

EXPECT_EQ(leastFloatTestThreeArgs(1.0, std::nanf("1"), 0.5), 0.5);
EXPECT_EQ(
leastFloatTestThreeArgs(
std::nanf("1"), 1.0, -std::numeric_limits<float>::infinity()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use kNegativeInf32 and kNegativeInf64 (or similar) constants for readability

@@ -48,8 +59,19 @@ void registerGeneralFunctions(const std::string& prefix) {
VELOX_REGISTER_VECTOR_FUNCTION(udf_reduce, prefix + "reduce");
VELOX_REGISTER_VECTOR_FUNCTION(udf_array_filter, prefix + "filter");

VELOX_REGISTER_VECTOR_FUNCTION(udf_least, prefix + "least");
VELOX_REGISTER_VECTOR_FUNCTION(udf_greatest, prefix + "greatest");
registerGreatestLeastFunction<bool>(prefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, extract this code into registerGreatestLeastFunctions(prefix) helper function; otherwise, as more functions are added in the same manner, this method will become very large.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@Real-Chen-Happy
Copy link
Contributor Author

@mbasmanova Just updated the code to fix the nit issues. Thank you for helping review!

Also, may I know how to get invited to Velox Slack channel? I sent my request a week ago to velox@meta.com, but still not hear back anything yet. Let me know if there is anything I need to do prior to joining. Thanks!

#include "velox/functions/prestosql/InPredicate.h"

namespace facebook::velox::functions {

template <typename T>
inline void registerGreatestLeastFunctionHelper(const std::string& prefix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use names like xxxHelper. Any chance you could rename this function to registerGreatestLeastFunction and the other one to ___s (registerGreatestLeastFunctions)? Or something along these lines. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about registerGreatestLeastFunction and registerAllGreatestLeastFunctions ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@sanumandla
Copy link

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Thank you for the reminder @mbasmanova . @Real-Chen-Happy I just sent you an invite. Can you check if you can access slack now?

@Real-Chen-Happy
Copy link
Contributor Author

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Thank you for the reminder @mbasmanova . @Real-Chen-Happy I just sent you an invite. Can you check if you can access slack now?

Yes I have the access now! Thank you so much! @sanumandla

@Real-Chen-Happy
Copy link
Contributor Author

Real-Chen-Happy commented Apr 5, 2024

@mbasmanova Just updated the code to rename the registration function. Thank you again for helping review!

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

* See the License for the specific language governing permissions and
* limitations under the License.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header is missing #pragma once.

I'll fix this, but wanted to let you know.

registerGreatestLeastFunction<int16_t>(prefix);
registerGreatestLeastFunction<int32_t>(prefix);
registerGreatestLeastFunction<int64_t>(prefix);
registerGreatestLeastFunction<int128_t>(prefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed. int128_t is used for LONG_DECIMAL type, which you are registering on L44.

I'll fix this, but wanted to let you know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova
image
image
According to type definitions, it seems like ShortDecimal is using int64_t, and the ShortDecimal is also being registered. I am curious in this case do we need to remove registerGreatestLeastFunction<int64_t>(prefix); in addition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Real-Chen-Happy Function registry stores a mapping from function signature to function implementation. Function signature is defined using logical type, e.g. f(integer) is different from f(date) even though both integer and date types are backed by INTEGER. Hence, we need a register call for each signature we wish to support. Since we want to support least(bigint,...), we need to call registerFunction<int64_t>.

Now, there is only one type backed by int128_t, that is long decimal. It actually seems like a bug that ``registerFunction<int128_t>` call is allowed. I think it should not be allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova I am thinking registerFunction<int128_t> may be reserved for HUGEINT?

Also, it seems like we are using logical types (ShortDecimal ...), Physical types (Varchar), and C++ types (int8_t ...) in function registrations, which is a bit confusing. Is there any future plan where we can just use physical types and logical types for function registrations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually no HUGEINT type in Velox. There is TypeKind::HUGEINT though. Things could definitely be clearer. Too many things to fix / improve. Not enough time.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Real-Chen-Happy I'm seeing errors internally when trying to land this change. I'll work on resolving this, but there might be a delay.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in e29cde7.

Copy link

Conbench analyzed the 1 benchmark run on commit e29cde7b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@Real-Chen-Happy
Copy link
Contributor Author

@mbasmanova Thank you again for helping review and fix errors. I am very excited to submit my first commit to Velox!

@mbasmanova
Copy link
Contributor

@Real-Chen-Happy Thank you for the contribution.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by fd5643a.

bikramSingh91 pushed a commit to bikramSingh91/velox that referenced this pull request Apr 24, 2024
…nctions using simple function API""

Summary:
Re-introducing this as the issue that initiated the backout is resolved.

Original PR: facebookincubator#9308

Differential Revision: D56548695
facebook-github-bot pushed a commit that referenced this pull request Apr 25, 2024
…nctions using simple function API""

Summary:
Re-introducing this as the issue that initiated the backout is resolved.

Original PR: #9308

Reviewed By: mbasmanova, s4ayub

Differential Revision: D56548695

fbshipit-source-id: d0a9032f5cc958c8f4a3124c1ad81f290e31800b
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
facebookincubator#9308)

Summary:
Refactor the greatest/least functions using simple function API.

Also, add support for NaN comparisons for DOUBLE and REAL. NaN is the biggest according to prestodb/presto#22391

Fixes facebookincubator#3728

Pull Request resolved: facebookincubator#9308

Reviewed By: xiaoxmeng

Differential Revision: D55793910

Pulled By: mbasmanova

fbshipit-source-id: c389bad91197f00ced549d816a15efab5a2dd910
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…nctions using simple function API""

Summary:
Re-introducing this as the issue that initiated the backout is resolved.

Original PR: facebookincubator#9308

Reviewed By: mbasmanova, s4ayub

Differential Revision: D56548695

fbshipit-source-id: d0a9032f5cc958c8f4a3124c1ad81f290e31800b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite Presto function greatest/least using simple function interface
4 participants