You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue was discovered as part of an audit of all the comparison and ordering behaviors for NaN across Presto functions (related to #21936 and #21877).
While there are a lot of inconsistencies in how NaN are handled that need to be addressed, map_top_n can produce definite wrong results when NaN values shows up in the map.
According to the documentaion, map_top_n "Truncates map items. Keeps only the top N elements by value. n must be a non-negative integer"
In the presence of NaN values, NaN seems to "reset" the search for topn entries
select map_top_n(map(array['a', 'b', 'c'], array[nan(), 3, 2]),1);
_col0
---------
{b=3.0}
(1 row)
-- BUG! regardless of interpretation of NaN, 2 is always less than 3. So this result is definitely incorrect
select map_top_n(map(array['a', 'b', 'c'], array[3, nan(), 2]),1);
_col0
---------
{c=2.0}
(1 row)
select map_top_n(map(array['a', 'b', 'c'], array[3, 2, nan()]),1);
_col0
---------
{c=NaN}
(1 row)
The text was updated successfully, but these errors were encountered:
It's not a bug in array_sort itself, but in the lambda function that we are passing to it. Array_sort without a function argument sorts with NaN as the largest. This passes a function that does element comparison for the values, but < and = comparison with NaN always return false.
The relevant part of the sql function is here:
`array_sort(map_entries(map_filter(input, (k, v) -> v is not null)), (x, y) -> IF(x[2] < y[2], 1, IF(x[2] = y[2], IF(x[1] < y[1], 1, -1), -1)))`
A side note, array_sort with a lambda is quite dangerous. It is very easy to write lambda that doesn't match implicit requirements (like in this case): https://velox-lib.io/blog/array-sort
This issue was discovered as part of an audit of all the comparison and ordering behaviors for NaN across Presto functions (related to #21936 and #21877).
While there are a lot of inconsistencies in how NaN are handled that need to be addressed,
map_top_n
can produce definite wrong results when NaN values shows up in the map.According to the documentaion, map_top_n "Truncates map items. Keeps only the top N elements by value. n must be a non-negative integer"
In the presence of NaN values, NaN seems to "reset" the search for topn entries
The text was updated successfully, but these errors were encountered: