Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gluten-1.2] Backport facebookincubator#9025 and facebookincubator#10979 to fix the result mismatch in RowsStreamingWindowBuild #499

Merged
merged 4 commits into from
Sep 20, 2024

Conversation

ccat3z and others added 4 commits September 18, 2024 17:38
…kincubator#9025)

Summary:
Unlike `StreamingWindowBuild`, `RowLevelStreamingWindowBuild ` in this PR is capable of processing window functions as rows arrive within a single partition, without the need to wait for the entire partition to be ready. This approach can significantly reduce memory usage, especially when a single partition contains a large amount of data. It is particularly suited for optimizing `rank `and `row_number `functions, as well as aggregate window functions with a default frame.

The detailed discussions is [here](facebookincubator#8975). The design doc is [here](https://docs.google.com/document/d/17ONSJHK8XP5Lixm8XBl01RMNl4ntpixiVFe693ahw6k/edit?usp=sharing).

Pull Request resolved: facebookincubator#9025

Test Plan: Run through 10hrs fuzzer testing

Reviewed By: kagamiori

Differential Revision: D61473798

Pulled By: xiaoxmeng

fbshipit-source-id: 569a752770395330c48a3521bd5421eb89f5623d
…r#10979)

Summary:
For a Range frame, it is necessary to ensure that the peer is ready before commencing the window function computation

Pull Request resolved: facebookincubator#10979

Reviewed By: kagamiori

Differential Revision: D62622816

Pulled By: xiaoxmeng

fbshipit-source-id: 1a9911da416c867c9e295242a05d0f33fbc2e22d
@ccat3z ccat3z changed the title [Gluten-1.2] Backport facebookincubator#9025 and facebookincubator#10979 to fix the result mismatch in RowsStreamingWindowBuild (apache/incubator-gluten#7194) [Gluten-1.2] Backport facebookincubator#9025 and facebookincubator#10979 to fix the result mismatch in RowsStreamingWindowBuild Sep 18, 2024
@ccat3z
Copy link
Author

ccat3z commented Sep 19, 2024

@weiting-chen @JkSelf Can you help to review this PR?

@@ -132,7 +132,7 @@ TEST_F(WindowTest, rowBasedStreamingWindowOOM) {

VELOX_ASSERT_THROW(
readCursor(params, [](Task*) {}),
"Exceeded memory pool capacity after attempt to grow capacity through arbitration.");
"Exceeded memory pool cap of");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccat3z Why need this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facebookincubator#10506 changed exception message, which not included in branch 1.2

@weiting-chen weiting-chen self-requested a review September 20, 2024 03:45
@weiting-chen
Copy link
Collaborator

The code looks fine to me.
The PR has also run pass 333 velox tests in jenkins with 1 failed test related to ParquetTableScanTest.timestampFilter test report empty parquet file, which is not related to the PR.
The issue is related to facebookincubator#4680 and not related to this PR.
And I will approval to merge it.

@weiting-chen
Copy link
Collaborator

After merge the code, I will trigger another test in Gluten for branch-1.2 to see if there is any other issue.

@weiting-chen weiting-chen merged commit 4397808 into oap-project:branch-1.2 Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants