-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable sort for single-level nesting struct columns on GPU #1883
Enable sort for single-level nesting struct columns on GPU #1883
Conversation
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Status: TEST_PARALLEL=1 SPARK_HOME=~/spark-3.1.1-bin-hadoop3.2 integration_tests/run_pyspark_from_build.sh \
-k "test_single_orderby and struct"
ai.rapids.cudf.CudfException: cuDF failure at: /home/gshegalov/gits/cudf/cpp/src/sort/sort_column.cu:112: Column type must be relationally comparable
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's here looks good, but there are some things missing.
At some point this needs to be built with mvn verify
to generate the changes to supported_ops.md
and those changes should be in this PR.
It would be nice to have some negative testing for this, i.e.: testing that we don't try to place sorts of deeply nested struct key columns or non-default struct sort ordering on the GPU.
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
build |
$ TEST_PARALLEL=1 COVERAGE_SUBMIT_FLAGS=$JDBSTR SPARK_HOME=~/spark-3.1.1-bin-hadoop3.2 integration_tests/run_pyspark_from_build.sh -k "test_single_struct and orderby"
=========================== short test summary info ============================
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS FIRST'>0-Struct(['child0', Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null])])]
second-level structs are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS FIRST'>0-Array(String)]
arrays are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS FIRST'>0-Map(String(not_null),Map(String(not_null),String))]
maps are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS FIRST'>1-Struct(['child0', Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null])])]
second-level structs are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS FIRST'>1-Array(String)]
arrays are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS FIRST'>1-Map(String(not_null),Map(String(not_null),String))]
maps are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS LAST'>-Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null])]
opposite null order not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS LAST'>-Struct(['child0', Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null])])]
opposite null order not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS LAST'>-Array(String)]
opposite null order not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a ASC NULLS LAST'>-Map(String(not_null),Map(String(not_null),String))]
opposite null order not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS LAST'>0-Struct(['child0', Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null])])]
second-level structs are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS LAST'>0-Array(String)]
arrays are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS LAST'>0-Map(String(not_null),Map(String(not_null),String))]
maps are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS FIRST'>-Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null])]
opposite null order not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS FIRST'>-Struct(['child0', Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null])])]
opposite null order not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS FIRST'>-Array(String)]
opposite null order not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS FIRST'>-Map(String(not_null),Map(String(not_null),String))]
opposite null order not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS LAST'>1-Struct(['child0', Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null])])]
second-level structs are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS LAST'>1-Array(String)]
arrays are not supported
XFAIL ../../src/main/python/sort_test.py::test_single_nested_order_orderby[Column<'a DESC NULLS LAST'>1-Map(String(not_null),Map(String(not_null),String))]
maps are not supported
=============== 4 passed, 4848 deselected, 20 xfailed in 43.43s ================ |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Gera Shegalov <gera@apache.org>
- Create a single config for RAT in parent. - Update RAT exlusion list to filter out hidden diretories Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU Sort uses upperbound, but not in all cases.
spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala
Line 435 in ec2dcd7
withResource(sorter.upperBound(mergedBatch, cutoffCb)) { result => |
without rapidsai/cudf#7690 that means we have a ticking time bomb in the code.
Also the single partition work around makes no since to me at all. Until that is also fixed I see no way for us to merge this in.
I understand the concern about GPU Sort. As for the single partition workaround, It's just incremental delivery which mimics the progress we have on the cuDF side. It would make sense to me as long as it works as documented. |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks better, but I didn't see a test added for the ability to sort a table containing arrays but not using arrays as the key. Is that going to be done as a followup then?
@jlowe these existing tests cover the scenarios you are calling out. they are passing now locally |
build |
Great, thanks @gerashegalov! |
Adds single-level struct columns to sort. This PR contributes to NVIDIA#1605 The following limitations apply with this PR for a total sort, and will be resolved in follow-up PR's - only if the number of partitions is 1 - only if spark.rapids.sql.stableSort.enabled is true Signed-off-by: Gera Shegalov <gera@apache.org>
Adds single-level struct columns to sort. This PR contributes to NVIDIA#1605 The following limitations apply with this PR for a total sort, and will be resolved in follow-up PR's - only if the number of partitions is 1 - only if spark.rapids.sql.stableSort.enabled is true Signed-off-by: Gera Shegalov <gera@apache.org>
This PR contributes to #1605. It depends on rapidsai/cudf#7422
Signed-off-by: Gera Shegalov gera@apache.org