You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Performing a collect_set aggregation on a lists column of lists of ints returns the wrong resulting scalar type when the input lists column contains all nulls. Instead of returning a LIST<LIST<INT32>> it returns a LIST<INT32>. When there are non-null entries in the input it properly returns LIST<LIST<INT32>>.
Steps/Code to reproduce bug
Compile and run the following program.
#include<cudf_test/debug_utilities.hpp>
#include<cudf_test/column_wrapper.hpp>
#include<cudf/aggregation.hpp>
#include<cudf/reduction.hpp>
#include<cudf/scalar/scalar.hpp>
#include<iostream>using LCW = cudf::test::lists_column_wrapper<int>;
voiddebug(cudf::lists_column_view const& lists) {
std::cout << "Lists size: " << lists.size() << " null count: " << lists.null_count() << std::endl;
auto child = lists.child();
int id = static_cast<int>(child.type().id());
std::cout << "Lists child type: " << id << " size: " << child.size() << " null count: " << child.null_count() << std::endl;
if (child.type().id() == cudf::type_id::LIST) {
child = cudf::lists_column_view(child).child();
id = static_cast<int>(child.type().id());
std::cout << "Lists child child type: " << id << " size: " << child.size() << " null count: " << child.null_count() << std::endl;
}
}
voidcollect_set(cudf::lists_column_view const& lists) {
std::cout << "Before reduction:" << std::endl;
debug(lists);
auto agg = cudf::make_collect_set_aggregation<cudf::reduce_aggregation>(cudf::null_policy::EXCLUDE, cudf::null_equality::UNEQUAL, cudf::nan_equality::ALL_EQUAL);
auto r = cudf::reduce(lists.parent(), *agg, cudf::data_type{cudf::type_id::LIST});
std::cout << "After reduction:" << std::endl;
auto rl = dynamic_cast<cudf::list_scalar*>(r.get());
debug(rl->view());
}
intmain(int argc, char** argv) {
auto valids = cudf::detail::make_counting_transform_iterator(0, [](auto i) { return i != 0; });
cudf::lists_column_view lists = cudf::test::lists_column_wrapper<int32_t, int32_t>{ {LCW{{1, 2, 3}}, LCW{{1, 2, 3}}}, valids };
std::cout << "=== With null list and non-null list ===" << std::endl;
collect_set(lists);
std::cout << "=== With only null list ===" << std::endl;
lists = cudf::test::lists_column_wrapper<int32_t, int32_t>{ {LCW{{1, 2, 3}}}, valids };
collect_set(lists);
return0;
}
Expected behavior
The collect_set aggregation should consistently return the same output type for a given input type, regardless of the input data for that type. In this specific case, it should have returned LIST<LIST<INT32>> instead of LIST<INT32>.
The text was updated successfully, but these errors were encountered:
…5243)
This fixes a bug in the reduction code that shows up specifically in `collect_list`/`collect_set` of lists column. In particular, the output of these reduction ops should be a list scalar holding a column that has exactly the same type structure as the input. However, when the input column contains all nulls, the output list scalar holds an empty column having wrong type structure.
Closes#14924.
Authors:
- Nghia Truong (https://github.com/ttnghia)
Approvers:
- David Wendt (https://github.com/davidwendt)
- Bradley Dice (https://github.com/bdice)
URL: #15243
Describe the bug
Performing a collect_set aggregation on a lists column of lists of ints returns the wrong resulting scalar type when the input lists column contains all nulls. Instead of returning a
LIST<LIST<INT32>>
it returns aLIST<INT32>
. When there are non-null entries in the input it properly returnsLIST<LIST<INT32>>
.Steps/Code to reproduce bug
Compile and run the following program.
This produces the following output:
Note how the result types change based on whether the input has all nulls or not:
vs.
Expected behavior
The collect_set aggregation should consistently return the same output type for a given input type, regardless of the input data for that type. In this specific case, it should have returned
LIST<LIST<INT32>>
instead ofLIST<INT32>
.The text was updated successfully, but these errors were encountered: