-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] groupby
on STRUCT
/ LIST
input passes silently, returns incorrect grouping.
#8905
Comments
Here's the corresponding repro case for TEST_F(UtilitiesTest, groupby_lists_bork)
{
using namespace cudf;
using ints = fixed_width_column_wrapper<int32_t>;
using int_lists = lists_column_wrapper<int32_t>;
auto lists = int_lists{ {1,2,3}, {4,5,6}, {1,2,3}, {4,5,6}, {1,2,3}, {4,5,6} };
auto vals = ints { 0, 1, 2, 3, 4, 5};
auto expect_lists_keys = int_lists{{1, 2, 3}, {4,5,6}};
ints expect_vals{9, 19, 17};
std::cout << "Lists input to groupby (keys)!" << std::endl;
print(lists);
std::cout << "Expected output (grouped) lists keys!" << std::endl;
print(expect_lists_keys);
auto requests = std::vector<groupby::aggregation_request>{};
requests.emplace_back(groupby::aggregation_request{});
auto& agg_request = requests.front();
agg_request.values = vals;
agg_request.aggregations.push_back(cudf::make_sum_aggregation());
auto gby = groupby::groupby{table_view{{lists}}, null_policy::EXCLUDE, sorted::NO, {}, {}};
auto result = gby.aggregate(requests);
std::cout << "Actual output (grouped) lists keys: " << std::endl;
print(result.first->view().column(0));
} |
groupby
on STRUCT
and LIST
input passes silently, and returns incorrect grouping.groupby
on STRUCT
/ LIST
input passes silently returns incorrect grouping.
groupby
on STRUCT
/ LIST
input passes silently returns incorrect grouping.groupby
on STRUCT
/ LIST
input passes silently, returns incorrect grouping.
Update: The remaining sticking point is grouping on |
With a little bit of digging, one sees that
However, hash-based aggregations still pass silently, except in debug-builds:
It only fails when it attempts to construct a hash value for a Rather than depend on the benevolence of |
Fixes rapidsai#8905. Attempting groupby aggregations with LIST keys leads to silent failures and bad results. For instance, attempting hash-based groupby aggregations with LIST keys only fails on DEBUG builds, thus: ``` /home/myth/dev/cudf/2/cpp/include/cudf/table/row_operators.cuh:447: unsigned int cudf: :element_hasher_with_seed<hash_function, has_nulls>::operator()(cudf::column_device_view, signed in t) const [with T = cudf::list_view; void *<anonymous> = (void *)nullptr; hash_function = default_ha sh; __nv_bool has_nulls = false]: block: [0,0,0], thread: [0,0,0] Assertion `false && "Unsupported type in hash."` failed. ``` In RELEASE builds, a copy of the input LIST column is returned, causing each output row to be interpreted as its own group. This commit adds an explicit failure for unsupported LIST groupby keys.
After #9024,
|
Fixes #8905. Attempting groupby aggregations with `LIST` keys leads to silent failures and bad results. For instance, attempting hash-based `groupby` aggregations with `LIST` keys only fails on DEBUG builds, thus: ``` /home/myth/dev/cudf/2/cpp/include/cudf/table/row_operators.cuh:447: unsigned int cudf: :element_hasher_with_seed<hash_function, has_nulls>::operator()(cudf::column_device_view, signed in t) const [with T = cudf::list_view; void *<anonymous> = (void *)nullptr; hash_function = default_ha sh; __nv_bool has_nulls = false]: block: [0,0,0], thread: [0,0,0] Assertion `false && "Unsupported type in hash."` failed. ``` In RELEASE builds, a copy of the input `LIST` column is returned, causing each output row to be interpreted as its own group. This commit adds an explicit failure for unsupported groupby key types, i.e. those that don't support equality comparisons (like `LIST`). Authors: - MithunR (https://github.com/mythrocks) Approvers: - Nghia Truong (https://github.com/ttnghia) - Robert Maynard (https://github.com/robertmaynard) - Jake Hemstad (https://github.com/jrhemstad) URL: #9227
When a
groupby
is constructed on aSTRUCT
orLIST
input, it does not currently fail, as one might expect of an unsupported type.The
groupby.aggregate()
seems to return the input groupby column unchanged, implying that each row is its own group. I think this also contributes to issues like #8887, where thegrouped_rolling_window()
returns unexpected output.It might be worth considering a
CUDF_FAIL()
for unsupported input types, at least until support is added.The repro case for
STRUCT
input follows:The text was updated successfully, but these errors were encountered: