Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] SortOrder supports arrays #2470

Open
wbo4958 opened this issue May 21, 2021 · 1 comment
Open

[FEA] SortOrder supports arrays #2470

wbo4958 opened this issue May 21, 2021 · 1 comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request

Comments

@wbo4958
Copy link
Collaborator

wbo4958 commented May 21, 2021

Is your feature request related to a problem? Please describe.

        SELECT
            a,
            b,
            c,
            d,
            LEAD(d, 1) OVER (PARTITION BY a ORDER BY b,c,d ASC) lead_array_1
        FROM table

if the type of column d is Array, then any expressions and plans with SortOrder will not be accelerated by GPU

!Exec <WindowExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> lead(d#3, 1, null) windowspecdefinition(a#0L, b#1L ASC NULLS FIRST, c#2L ASC NULLS FIRST, d#3 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead_array_1#15 could run on GPU
    @Expression <WindowExpression> lead(d#3, 1, null) windowspecdefinition(a#0L, b#1L ASC NULLS FIRST, c#2L ASC NULLS FIRST, d#3 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) could run on GPU
      @Expression <Lead> lead(d#3, 1, null) could run on GPU
      !Expression <WindowSpecDefinition> windowspecdefinition(a#0L, b#1L ASC NULLS FIRST, c#2L ASC NULLS FIRST, d#3 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) cannot run on GPU because expression SortOrder d#3 ASC NULLS FIRST produces an unsupported type ArrayType(IntegerType,true)
        @Expression <AttributeReference> a#0L could run on GPU
        @Expression <SortOrder> b#1L ASC NULLS FIRST could run on GPU
          @Expression <AttributeReference> b#1L could run on GPU
        @Expression <SortOrder> c#2L ASC NULLS FIRST could run on GPU
@wbo4958 wbo4958 added feature request New feature or request ? - Needs Triage Need team to review and classify labels May 21, 2021
wbo4958 added a commit to wbo4958/spark-rapids that referenced this issue May 21, 2021
If some rows of order-by columns (it's `a,b,c` in the test) are equal,
then it may fail because CPU and GPU can't guarantee the order for the
same rows, while lead/lag is typically depending on row's order.

The solution is we should add the aggregation column `d` and the default column
`d_default` columns into the order-by to guarantee the order. But for now,
sorting on array has not been supported yet, see
NVIDIA#2470.

So this PR just skip the test

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
pxLi pushed a commit that referenced this issue May 21, 2021
If some rows of order-by columns (it's `a,b,c` in the test) are equal,
then it may fail because CPU and GPU can't guarantee the order for the
same rows, while lead/lag is typically depending on row's order.

The solution is we should add the aggregation column `d` and the default column
`d_default` columns into the order-by to guarantee the order. But for now,
sorting on array has not been supported yet, see
#2470.

So this PR just skip the test

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
@sameerz sameerz added cudf_dependency An issue or PR with this label depends on a new feature in cudf and removed ? - Needs Triage Need team to review and classify labels May 25, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this issue Jun 9, 2021
If some rows of order-by columns (it's `a,b,c` in the test) are equal,
then it may fail because CPU and GPU can't guarantee the order for the
same rows, while lead/lag is typically depending on row's order.

The solution is we should add the aggregation column `d` and the default column
`d_default` columns into the order-by to guarantee the order. But for now,
sorting on array has not been supported yet, see
NVIDIA#2470.

So this PR just skip the test

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this issue Jun 9, 2021
If some rows of order-by columns (it's `a,b,c` in the test) are equal,
then it may fail because CPU and GPU can't guarantee the order for the
same rows, while lead/lag is typically depending on row's order.

The solution is we should add the aggregation column `d` and the default column
`d_default` columns into the order-by to guarantee the order. But for now,
sorting on array has not been supported yet, see
NVIDIA#2470.

So this PR just skip the test

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
@ttnghia
Copy link
Collaborator

ttnghia commented Nov 30, 2022

Sorting non-nested arrays is now supported in libcudf so I believe that this can be partially fixed if needed. Support for sorting nested arrays may need to wait further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants