Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Spark first/last aggregate functions intermediate type from T to row(T, boolean) #4873

Closed
wants to merge 2 commits into from

Conversation

Yohahaha
Copy link
Contributor

@Yohahaha Yohahaha commented May 9, 2023

Fix first/last Spark aggregate functions' intermediate type from T to row(T, boolean) to align with Spark, the second column in signature is just a place holder.

oap-project#245
apache/incubator-gluten#1581

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 9, 2023
@netlify
Copy link

netlify bot commented May 9, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 5c574db
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/64a65f34c2e9dd00085fa87b

@Yohahaha
Copy link
Contributor Author

Yohahaha commented May 9, 2023

Hi @mbasmanova could you help take a look?

This PR fix first/last aggregate function's intermediate type from T to row(T, boolean), second column just a place holder.

@Yohahaha Yohahaha changed the title Fix Spark First/Last aggregate functions intermediate signature Fix Spark First/Last aggregate functions intermediate type May 9, 2023
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta @kagamiori Jimmy, Wei, would you help review this PR?

@Yohahaha Yohahaha marked this pull request as draft May 10, 2023 02:26
@Yohahaha Yohahaha marked this pull request as ready for review May 10, 2023 16:28
@Yohahaha Yohahaha marked this pull request as draft May 10, 2023 16:29
@Yohahaha Yohahaha changed the title Fix Spark First/Last aggregate functions intermediate type Fix Spark First/Last aggregate functions intermediate type and support decimal May 11, 2023
@Yohahaha Yohahaha marked this pull request as ready for review May 11, 2023 07:59
@Yohahaha
Copy link
Contributor Author

Yohahaha commented May 11, 2023

Hi @Yuhta @kagamiori could you help review this PR?

@Yohahaha
Copy link
Contributor Author

Hi @mbasmanova @Yuhta @kagamiori could you help take a look?

@Yohahaha
Copy link
Contributor Author

Yohahaha commented Jun 5, 2023

Hi @mbasmanova @Yuhta @kagamiori would you help take a look on this PR? Thanks!

@Yohahaha
Copy link
Contributor Author

This PR fix first/last aggregate function's intermediate type from T to row(T, boolean), and add decimal support.

Could someone help take a look? thanks! @mbasmanova @kagamiori @Yuhta @kgpai

@Yohahaha Yohahaha changed the title Fix Spark First/Last aggregate functions intermediate type and support decimal Fix Spark First/Last aggregate functions intermediate type and add decimal type support Jun 28, 2023
@Yohahaha
Copy link
Contributor Author

Hi @mbasmanova @Yuhta @majetideepak @kagamiori, this pr has already merged in oap/velox and tested by oap/gluten, could someone help take a look?

@mbasmanova mbasmanova requested a review from majetideepak June 29, 2023 07:42
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

@Yohahaha Yohahaha force-pushed the first_last_meta branch 2 times, most recently from dbf4a6d to c24f24b Compare June 30, 2023 06:46
@Yohahaha Yohahaha changed the title Fix Spark First/Last aggregate functions intermediate type and add decimal type support Fix Spark first/last aggregate functions intermediate type from T to row(T, boolean) Jun 30, 2023
@Yohahaha
Copy link
Contributor Author

Updates:

  1. Separate decimal support to another PR, update PR title and description.
  2. Remove dynamic_cast.
  3. Use DecodedVector::isNullAt.

@mbasmanova could you review again? thanks!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yohahaha Thank you for iterating on this PR. Some further comments.

@Yohahaha
Copy link
Contributor Author

Yohahaha commented Jul 6, 2023

Updates:

  1. Remove exec::
  2. Minor refactor for updateValue to avoid duplicate casting
  3. Minor refactor for reuse DecodedVector

@mbasmanova could you help review again? thanks!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good % one question.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

Would you rebase on top of latest main so I could merge?

fix

Trigger CI

fix format
@Yohahaha
Copy link
Contributor Author

Yohahaha commented Jul 6, 2023

Would you rebase on top of latest main so I could merge?

Sure, already rebased, thanks!

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

Looks like Spark Fuzzer failed. Please, take a look.

@Yohahaha
Copy link
Contributor Author

Yohahaha commented Jul 6, 2023

Looks like Spark Fuzzer failed. Please, take a look.

Thanks! I will fix it and run a long fuzzer.

@Yohahaha
Copy link
Contributor Author

Yohahaha commented Jul 6, 2023

./spark_aggregation_fuzzer_test --seed 2677297971 --duration_sec 1800 --logtostderr=1 --minloglevel=0 --repro_persist_path=/tmp/spark_aggregate_fuzzer_repro

==============================> Done with iteration 4645
 Total functions tested: 5
 Total masked aggregations: 744 (16.01%)
 Total global aggregations: 388 (8.35%)
 Total group-by aggregations: 3391 (72.99%)
 Total distinct aggregations: 452 (9.73%)
 Total window expressions: 415 (8.93%)
 Total aggregations verified against DuckDB: 1747 (37.60%)
 Total failed aggregations: 0 (0.00%)

@Yohahaha
Copy link
Contributor Author

Yohahaha commented Jul 6, 2023

The root cause was I passed decoded.index(i) to updateValue in addRawInputs, it's lead index overflow error.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 9fdf36e.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit 9fdf36ed.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@Yohahaha Yohahaha deleted the first_last_meta branch April 30, 2024 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants