Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flox can't handle cftime objects #6613

Closed
4 tasks done
aulemahal opened this issue May 16, 2022 · 2 comments · Fixed by xarray-contrib/flox#108
Closed
4 tasks done

Flox can't handle cftime objects #6613

aulemahal opened this issue May 16, 2022 · 2 comments · Fixed by xarray-contrib/flox#108

Comments

@aulemahal
Copy link
Contributor

aulemahal commented May 16, 2022

What happened?

I use resampling to count the number of timesteps within time periods. So the simple way is to : da.time.resample(time='YS').count(). With the current master, a non-standard calendar and with floxinstalled, this fails : flox can't handle the cftime objects of the time coordinate.

What did you expect to happen?

I expected the count of elements for each period to be returned.

Minimal Complete Verifiable Example

import xarray as xr

timeNP = xr.DataArray(xr.date_range('2009-01-01', '2012-12-31', use_cftime=False), dims=('time',), name='time')

timeCF = xr.DataArray(xr.date_range('2009-01-01', '2012-12-31', use_cftime=True), dims=('time',), name='time')

timeNP.resample(time='YS').count() # works

timeCF.resample(time='YS').count() # Fails

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 a.resample(time='YS').count()

File ~/Python/myxarray/xarray/core/_reductions.py:5456, in DataArrayResampleReductions.count(self, dim, keep_attrs, **kwargs)
   5401 """
   5402 Reduce this DataArray's data by applying ``count`` along some dimension(s).
   5403 
   (...)
   5453   * time     (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
   5454 """
   5455 if flox and OPTIONS["use_flox"] and contains_only_dask_or_numpy(self._obj):
-> 5456     return self._flox_reduce(
   5457         func="count",
   5458         dim=dim,
   5459         # fill_value=fill_value,
   5460         keep_attrs=keep_attrs,
   5461         **kwargs,
   5462     )
   5463 else:
   5464     return self.reduce(
   5465         duck_array_ops.count,
   5466         dim=dim,
   5467         keep_attrs=keep_attrs,
   5468         **kwargs,
   5469     )

File ~/Python/myxarray/xarray/core/resample.py:44, in Resample._flox_reduce(self, dim, **kwargs)
     41 labels = np.repeat(self._unique_coord.data, repeats)
     42 group = DataArray(labels, dims=(self._group_dim,), name=self._unique_coord.name)
---> 44 result = super()._flox_reduce(dim=dim, group=group, **kwargs)
     45 result = self._maybe_restore_empty_groups(result)
     46 result = result.rename({RESAMPLE_DIM: self._group_dim})

File ~/Python/myxarray/xarray/core/groupby.py:661, in GroupBy._flox_reduce(self, dim, **kwargs)
    658     expected_groups = (self._unique_coord.values,)
    659     isbin = False
--> 661 result = xarray_reduce(
    662     self._original_obj.drop_vars(non_numeric),
    663     group,
    664     dim=dim,
    665     expected_groups=expected_groups,
    666     isbin=isbin,
    667     **kwargs,
    668 )
    670 # Ignore error when the groupby reduction is effectively
    671 # a reduction of the underlying dataset
    672 result = result.drop_vars(unindexed_dims, errors="ignore")

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/xarray.py:308, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, split_out, fill_value, method, engine, keep_attrs, skipna, min_count, reindex, *by, **finalize_kwargs)
    305 input_core_dims = _get_input_core_dims(group_names, dim, ds, grouper_dims)
    306 input_core_dims += [input_core_dims[-1]] * (len(by) - 1)
--> 308 actual = xr.apply_ufunc(
    309     wrapper,
    310     ds.drop_vars(tuple(missing_dim)).transpose(..., *grouper_dims),
    311     *by,
    312     input_core_dims=input_core_dims,
    313     # for xarray's test_groupby_duplicate_coordinate_labels
    314     exclude_dims=set(dim),
    315     output_core_dims=[group_names],
    316     dask="allowed",
    317     dask_gufunc_kwargs=dict(output_sizes=group_sizes),
    318     keep_attrs=keep_attrs,
    319     kwargs={
    320         "func": func,
    321         "axis": axis,
    322         "sort": sort,
    323         "split_out": split_out,
    324         "fill_value": fill_value,
    325         "method": method,
    326         "min_count": min_count,
    327         "skipna": skipna,
    328         "engine": engine,
    329         "reindex": reindex,
    330         "expected_groups": tuple(expected_groups),
    331         "isbin": isbin,
    332         "finalize_kwargs": finalize_kwargs,
    333     },
    334 )
    336 # restore non-dim coord variables without the core dimension
    337 # TODO: shouldn't apply_ufunc handle this?
    338 for var in set(ds.variables) - set(ds.dims):

File ~/Python/myxarray/xarray/core/computation.py:1170, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args)
   1168 # feed datasets apply_variable_ufunc through apply_dataset_vfunc
   1169 elif any(is_dict_like(a) for a in args):
-> 1170     return apply_dataset_vfunc(
   1171         variables_vfunc,
   1172         *args,
   1173         signature=signature,
   1174         join=join,
   1175         exclude_dims=exclude_dims,
   1176         dataset_join=dataset_join,
   1177         fill_value=dataset_fill_value,
   1178         keep_attrs=keep_attrs,
   1179     )
   1180 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc
   1181 elif any(isinstance(a, DataArray) for a in args):

File ~/Python/myxarray/xarray/core/computation.py:460, in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, *args)
    455 list_of_coords, list_of_indexes = build_output_coords_and_indexes(
    456     args, signature, exclude_dims, combine_attrs=keep_attrs
    457 )
    458 args = [getattr(arg, "data_vars", arg) for arg in args]
--> 460 result_vars = apply_dict_of_variables_vfunc(
    461     func, *args, signature=signature, join=dataset_join, fill_value=fill_value
    462 )
    464 if signature.num_outputs > 1:
    465     out = tuple(
    466         _fast_dataset(*args)
    467         for args in zip(result_vars, list_of_coords, list_of_indexes)
    468     )

File ~/Python/myxarray/xarray/core/computation.py:402, in apply_dict_of_variables_vfunc(func, signature, join, fill_value, *args)
    400 result_vars = {}
    401 for name, variable_args in zip(names, grouped_by_name):
--> 402     result_vars[name] = func(*variable_args)
    404 if signature.num_outputs > 1:
    405     return _unpack_dict_tuples(result_vars, signature.num_outputs)

File ~/Python/myxarray/xarray/core/computation.py:750, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
    745     if vectorize:
    746         func = _vectorize(
    747             func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims
    748         )
--> 750 result_data = func(*input_data)
    752 if signature.num_outputs == 1:
    753     result_data = (result_data,)

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/xarray.py:291, in xarray_reduce.<locals>.wrapper(array, func, skipna, *by, **kwargs)
    288     if "nan" not in func and func not in ["all", "any", "count"]:
    289         func = f"nan{func}"
--> 291 result, *groups = groupby_reduce(array, *by, func=func, **kwargs)
    292 return result

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/core.py:1553, in groupby_reduce(array, func, expected_groups, sort, isbin, axis, fill_value, min_count, split_out, method, engine, reindex, finalize_kwargs, *by)
   1550 agg = _initialize_aggregation(func, array.dtype, fill_value, min_count, finalize_kwargs)
   1552 if not has_dask:
-> 1553     results = _reduce_blockwise(
   1554         array, by, agg, expected_groups=expected_groups, reindex=reindex, **kwargs
   1555     )
   1556     groups = (results["groups"],)
   1557     result = results[agg.name]

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/core.py:1008, in _reduce_blockwise(array, by, agg, axis, expected_groups, fill_value, engine, sort, reindex)
   1005     finalize_kwargs = (finalize_kwargs,)
   1006 finalize_kwargs = finalize_kwargs + ({},) + ({},)
-> 1008 results = chunk_reduce(
   1009     array,
   1010     by,
   1011     func=agg.numpy,
   1012     axis=axis,
   1013     expected_groups=expected_groups,
   1014     # This fill_value should only apply to groups that only contain NaN observations
   1015     # BUT there is funkiness when axis is a subset of all possible values
   1016     # (see below)
   1017     fill_value=agg.fill_value["numpy"],
   1018     dtype=agg.dtype["numpy"],
   1019     kwargs=finalize_kwargs,
   1020     engine=engine,
   1021     sort=sort,
   1022     reindex=reindex,
   1023 )  # type: ignore
   1025 if _is_arg_reduction(agg):
   1026     results["intermediates"][0] = np.unravel_index(results["intermediates"][0], array.shape)[-1]

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/core.py:677, in chunk_reduce(array, by, func, expected_groups, axis, fill_value, dtype, reindex, engine, kwargs, sort)
    675     result = reduction(group_idx, array, **kwargs)
    676 else:
--> 677     result = generic_aggregate(
    678         group_idx, array, axis=-1, engine=engine, func=reduction, **kwargs
    679     ).astype(dt, copy=False)
    680 if np.any(props.nanmask):
    681     # remove NaN group label which should be last
    682     result = result[..., :-1]

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/aggregations.py:49, in generic_aggregate(group_idx, array, engine, func, axis, size, fill_value, dtype, **kwargs)
     44 else:
     45     raise ValueError(
     46         f"Expected engine to be one of ['flox', 'numpy', 'numba']. Received {engine} instead."
     47     )
---> 49 return method(
     50     group_idx, array, axis=axis, size=size, fill_value=fill_value, dtype=dtype, **kwargs
     51 )

File /opt/miniconda3/envs/xclim-pip/lib/python3.9/site-packages/flox/aggregate_flox.py:86, in nanlen(group_idx, array, *args, **kwargs)
     85 def nanlen(group_idx, array, *args, **kwargs):
---> 86     return sum(group_idx, (~np.isnan(array)).astype(int), *args, **kwargs)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Anything else we need to know?

I was able to resolve this by modifying xarray.core.utils.contains_only_dask_or_numpy as to return False if the input's dtype is 'O'. This check seems to only be used when choosing between flox and the old algos. Does this make sense?

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.17.5-arch1-2
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_CA.utf8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 2022.3.1.dev16+g3ead17ea
pandas: 1.4.2
numpy: 1.21.6
scipy: 1.7.1
netCDF4: 1.5.7
pydap: None
h5netcdf: 0.11.0
h5py: 3.4.0
Nio: None
zarr: 2.10.0
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: 3.4.3
cartopy: None
seaborn: None
numbagg: None
fsspec: 2021.07.0
cupy: None
pint: 0.18
sparse: None
flox: 0.5.1
numpy_groupies: 0.9.16
setuptools: 57.4.0
pip: 21.2.4
conda: None
pytest: 6.2.5
IPython: 8.2.0
sphinx: 4.1.2

@aulemahal aulemahal added bug needs triage Issue that has not been reviewed by xarray team member labels May 16, 2022
@dcherian dcherian added upstream issue and removed needs triage Issue that has not been reviewed by xarray team member labels May 16, 2022
@dcherian
Copy link
Contributor

Nice find!

Yeah we could either skip these arrays using common.contains_datetime_like_objects (I think)

or refact this type of logic to a helper function and use that in groupby._flox_reduce

if array.dtype.kind in "Mm":
offset = _datetime_nanmin(array)
# xarray always uses np.datetime64[ns] for np.datetime64 data
dtype = "timedelta64[ns]"
return (
_mean(
datetime_to_numeric(array, offset), axis=axis, skipna=skipna, **kwargs
).astype(dtype)
+ offset
)
elif _contains_cftime_datetimes(array):
if is_duck_dask_array(array):
raise NotImplementedError(
"Computing the mean of an array containing "
"cftime.datetime objects is not yet implemented on "
"dask arrays."
)
offset = min(array)
timedeltas = datetime_to_numeric(array, offset, datetime_unit="us")
mean_timedeltas = _mean(timedeltas, axis=axis, skipna=skipna, **kwargs)
return _to_pytimedelta(mean_timedeltas, unit="us") + offset

@dcherian
Copy link
Contributor

Update: count should work now, but mean still fails.

dcherian added a commit to xarray-contrib/flox that referenced this issue Jun 2, 2022
Copy over a bunch of xarray code.

Closes pydata/xarray#6613
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants