Skip to content

Commit

Permalink
Optimize min_count for all numpy
Browse files Browse the repository at this point in the history
For pure numpy arrays, min_count=1 (xarray default) is the same
as min_count=None, with the right fill_value. This avoids
one useless pass over the data, and one useless copy.

We need to always accumulate count with dask, to make sure we
get the right values at the end.
  • Loading branch information
dcherian committed May 2, 2024
1 parent c398f4e commit 2f4f6ec
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions flox/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2486,6 +2486,12 @@ def groupby_reduce(
return (result, groups)

elif not has_dask:
if min_count_ == 1:
# optimize for pure numpy groupby
# We set the fill_value appropriately anyway
agg.min_count = None
agg.numpy = agg.numpy[:-1]

results = _reduce_blockwise(
array, by_, agg, expected_groups=expected_, reindex=reindex, sort=sort, **kwargs
)
Expand Down

0 comments on commit 2f4f6ec

Please sign in to comment.