Handle NaNs when decoding times (failures on riscv64) #7096

andreas-schwab · 2022-09-28T09:19:08Z

What happened?

FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_roundtrip_numpy_datetime_data
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_numpy_datetime_data
FAILED xarray/tests/test_backends.py::TestGenericNetCDFData::test_roundtrip_numpy_datetime_data
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_roundtrip_numpy_datetime_data
= 4 failed, 4636 passed, 5632 skipped, 19 xfailed, 22 xpassed, 38 warnings in 266.18s (0:04:26) =

What did you expect to happen?

No failures

Minimal Complete Verifiable Example

pytest-3.10 -n auto /usr/lib/python3.10/site-packages/xarray

MVCE confirmation

Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

=================================== FAILURES ===================================
___________ TestScipyInMemoryData.test_roundtrip_numpy_datetime_data ___________
[gw2] linux -- Python 3.10.7 /usr/bin/python3.10

num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian', use_cftime = None

    def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None):
        """Given an array of numeric dates in netCDF format, convert it into a
        numpy array of date time objects.
    
        For standard (Gregorian) calendars, this function uses vectorized
        operations, which makes it much faster than cftime.num2date. In such a
        case, the returned array will be of type np.datetime64.
    
        Note that time unit in `units` must not be smaller than microseconds and
        not larger than days.
    
        See Also
        --------
        cftime.num2date
        """
        num_dates = np.asarray(num_dates)
        flat_num_dates = num_dates.ravel()
        if calendar is None:
            calendar = "standard"
    
        if use_cftime is None:
            try:
>               dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:270: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

flat_num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian'

    def _decode_datetime_with_pandas(flat_num_dates, units, calendar):
        if not _is_standard_calendar(calendar):
            raise OutOfBoundsDatetime(
                "Cannot decode times from a non-standard calendar, {!r}, using "
                "pandas.".format(calendar)
            )
    
        delta, ref_date = _unpack_netcdf_time_units(units)
        delta = _netcdf_to_numpy_timeunit(delta)
        try:
            ref_date = pd.Timestamp(ref_date)
        except ValueError:
            # ValueError is raised by pd.Timestamp for non-ISO timestamp
            # strings, in which case we fall back to using cftime
            raise OutOfBoundsDatetime
    
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", "invalid value encountered", RuntimeWarning)
            pd.to_timedelta(flat_num_dates.min(), delta) + ref_date
            pd.to_timedelta(flat_num_dates.max(), delta) + ref_date
    
        # To avoid integer overflow when converting to nanosecond units for integer
        # dtypes smaller than np.int64 cast all integer and unsigned integer dtype
        # arrays to np.int64 (GH 2002, GH 6589).  Note this is safe even in the case
        # of np.uint64 values, because any np.uint64 value that would lead to
        # overflow when converting to np.int64 would not be representable with a
        # timedelta64 value, and therefore would raise an error in the lines above.
        if flat_num_dates.dtype.kind in "iu":
            flat_num_dates = flat_num_dates.astype(np.int64)
    
        # Cast input ordinals to integers of nanoseconds because pd.to_timedelta
        # works much faster when dealing with integers (GH 1399).
        flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
            np.int64
        )
    
        # Use pd.to_timedelta to safely cast integer values to timedeltas,
        # and add those to a Timestamp to safely produce a DatetimeIndex.  This
        # ensures that we do not encounter integer overflow at any point in the
        # process without raising OutOfBoundsDatetime.
>       return (pd.to_timedelta(flat_num_dates_ns_int, "ns") + ref_date).values

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:245: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00')

    @wraps(method)
    def new_method(self, other):
    
        if is_cmp and isinstance(self, ABCIndex) and isinstance(other, ABCSeries):
            # For comparison ops, Index does *not* defer to Series
            pass
        else:
            for cls in [ABCDataFrame, ABCSeries, ABCIndex]:
                if isinstance(self, cls):
                    break
                if isinstance(other, cls):
                    return NotImplemented
    
        other = item_from_zerodim(other)
    
>       return method(self, other)

/usr/lib64/python3.10/site-packages/pandas/core/ops/common.py:70: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00')

    @unpack_zerodim_and_defer("__add__")
    def __add__(self, other):
>       return self._arith_method(other, operator.add)

/usr/lib64/python3.10/site-packages/pandas/core/arraylike.py:100: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>

    def _arith_method(self, other, op):
        if (
            isinstance(other, Index)
            and is_object_dtype(other.dtype)
            and type(other) is not Index
        ):
            # We return NotImplemented for object-dtype index *subclasses* so they have
            # a chance to implement ops before we unwrap them.
            # See https://github.com/pandas-dev/pandas/issues/31109
            return NotImplemented
    
>       return super()._arith_method(other, op)

/usr/lib64/python3.10/site-packages/pandas/core/indexes/base.py:6734: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>

    def _arith_method(self, other, op):
        res_name = ops.get_op_result_name(self, other)
    
        lvalues = self._values
        rvalues = extract_array(other, extract_numpy=True, extract_range=True)
        rvalues = ops.maybe_prepare_scalar_for_op(rvalues, lvalues.shape)
        rvalues = ensure_wrapped_if_datetimelike(rvalues)
    
        with np.errstate(all="ignore"):
>           result = ops.arithmetic_op(lvalues, rvalues, op)

/usr/lib64/python3.10/site-packages/pandas/core/base.py:1295: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

left = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
right = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>

    def arithmetic_op(left: ArrayLike, right: Any, op):
        """
        Evaluate an arithmetic operation `+`, `-`, `*`, `/`, `//`, `%`, `**`, ...
    
        Note: the caller is responsible for ensuring that numpy warnings are
        suppressed (with np.errstate(all="ignore")) if needed.
    
        Parameters
        ----------
        left : np.ndarray or ExtensionArray
        right : object
            Cannot be a DataFrame or Index.  Series is *not* excluded.
        op : {operator.add, operator.sub, ...}
            Or one of the reversed variants from roperator.
    
        Returns
        -------
        ndarray or ExtensionArray
            Or a 2-tuple of these in the case of divmod or rdivmod.
        """
        # NB: We assume that extract_array and ensure_wrapped_if_datetimelike
        #  have already been called on `left` and `right`,
        #  and `maybe_prepare_scalar_for_op` has already been called on `right`
        # We need to special-case datetime64/timedelta64 dtypes (e.g. because numpy
        # casts integer dtypes to timedelta64 when operating with timedelta64 - GH#22390)
    
        if (
            should_extension_dispatch(left, right)
            or isinstance(right, (Timedelta, BaseOffset, Timestamp))
            or right is NaT
        ):
            # Timedelta/Timestamp and other custom scalars are included in the check
            # because numexpr will fail on it, see GH#31457
>           res_values = op(left, right)

/usr/lib64/python3.10/site-packages/pandas/core/ops/array_ops.py:216: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')

    @wraps(method)
    def new_method(self, other):
    
        if is_cmp and isinstance(self, ABCIndex) and isinstance(other, ABCSeries):
            # For comparison ops, Index does *not* defer to Series
            pass
        else:
            for cls in [ABCDataFrame, ABCSeries, ABCIndex]:
                if isinstance(self, cls):
                    break
                if isinstance(other, cls):
                    return NotImplemented
    
        other = item_from_zerodim(other)
    
>       return method(self, other)

/usr/lib64/python3.10/site-packages/pandas/core/ops/common.py:70: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')

    @unpack_zerodim_and_defer("__add__")
    def __add__(self, other):
        other_dtype = getattr(other, "dtype", None)
    
        # scalar others
        if other is NaT:
            result = self._add_nat()
        elif isinstance(other, (Tick, timedelta, np.timedelta64)):
            result = self._add_timedeltalike_scalar(other)
        elif isinstance(other, BaseOffset):
            # specifically _not_ a Tick
            result = self._add_offset(other)
        elif isinstance(other, (datetime, np.datetime64)):
>           result = self._add_datetimelike_scalar(other)

/usr/lib64/python3.10/site-packages/pandas/core/arrays/datetimelike.py:1264: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')

    def _add_datetimelike_scalar(self, other) -> DatetimeArray:
        # adding a timedeltaindex to a datetimelike
        from pandas.core.arrays import DatetimeArray
    
        assert other is not NaT
        other = Timestamp(other)
        if other is NaT:
            # In this case we specifically interpret NaT as a datetime, not
            # the timedelta interpretation we would get by returning self + NaT
            result = self.asi8.view("m8[ms]") + NaT.to_datetime64()
            return DatetimeArray(result)
    
        i8 = self.asi8
>       result = checked_add_with_arr(i8, other.value, arr_mask=self._isnan)

/usr/lib64/python3.10/site-packages/pandas/core/arrays/timedeltas.py:482: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = array([                  0, 9223372036854775807]), b = 946684800000000000
arr_mask = array([False, False]), b_mask = None

    def checked_add_with_arr(
        arr: np.ndarray,
        b,
        arr_mask: npt.NDArray[np.bool_] | None = None,
        b_mask: npt.NDArray[np.bool_] | None = None,
    ) -> np.ndarray:
        """
        Perform array addition that checks for underflow and overflow.
    
        Performs the addition of an int64 array and an int64 integer (or array)
        but checks that they do not result in overflow first. For elements that
        are indicated to be NaN, whether or not there is overflow for that element
        is automatically ignored.
    
        Parameters
        ----------
        arr : array addend.
        b : array or scalar addend.
        arr_mask : np.ndarray[bool] or None, default None
            array indicating which elements to exclude from checking
        b_mask : np.ndarray[bool] or None, default None
            array or scalar indicating which element(s) to exclude from checking
    
        Returns
        -------
        sum : An array for elements x + b for each element x in arr if b is
              a scalar or an array for elements x + y for each element pair
              (x, y) in (arr, b).
    
        Raises
        ------
        OverflowError if any x + y exceeds the maximum or minimum int64 value.
        """
        # For performance reasons, we broadcast 'b' to the new array 'b2'
        # so that it has the same size as 'arr'.
        b2 = np.broadcast_to(b, arr.shape)
        if b_mask is not None:
            # We do the same broadcasting for b_mask as well.
            b2_mask = np.broadcast_to(b_mask, arr.shape)
        else:
            b2_mask = None
    
        # For elements that are NaN, regardless of their value, we should
        # ignore whether they overflow or not when doing the checked add.
        if arr_mask is not None and b2_mask is not None:
            not_nan = np.logical_not(arr_mask | b2_mask)
        elif arr_mask is not None:
            not_nan = np.logical_not(arr_mask)
        elif b_mask is not None:
            # Argument 1 to "__call__" of "_UFunc_Nin1_Nout1" has incompatible type
            # "Optional[ndarray[Any, dtype[bool_]]]"; expected
            # "Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[An
            # y]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool,
            # int, float, complex, str, bytes]]]"  [arg-type]
            not_nan = np.logical_not(b2_mask)  # type: ignore[arg-type]
        else:
            not_nan = np.empty(arr.shape, dtype=bool)
            not_nan.fill(True)
    
        # gh-14324: For each element in 'arr' and its corresponding element
        # in 'b2', we check the sign of the element in 'b2'. If it is positive,
        # we then check whether its sum with the element in 'arr' exceeds
        # np.iinfo(np.int64).max. If so, we have an overflow error. If it
        # it is negative, we then check whether its sum with the element in
        # 'arr' exceeds np.iinfo(np.int64).min. If so, we have an overflow
        # error as well.
        i8max = lib.i8max
        i8min = iNaT
    
        mask1 = b2 > 0
        mask2 = b2 < 0
    
        if not mask1.any():
            to_raise = ((i8min - b2 > arr) & not_nan).any()
        elif not mask2.any():
            to_raise = ((i8max - b2 < arr) & not_nan).any()
        else:
            to_raise = ((i8max - b2[mask1] < arr[mask1]) & not_nan[mask1]).any() or (
                (i8min - b2[mask2] > arr[mask2]) & not_nan[mask2]
            ).any()
    
        if to_raise:
>           raise OverflowError("Overflow in int64 addition")
E           OverflowError: Overflow in int64 addition

/usr/lib64/python3.10/site-packages/pandas/core/algorithms.py:1114: OverflowError

During handling of the above exception, another exception occurred:

data = <xarray.backends.scipy_.ScipyArrayWrapper object at 0x40238999c0>
units = 'days since 2000-01-01 00:00:00', calendar = 'proleptic_gregorian'
use_cftime = None

    def _decode_cf_datetime_dtype(data, units, calendar, use_cftime):
        # Verify that at least the first and last date can be decoded
        # successfully. Otherwise, tracebacks end up swallowed by
        # Dataset.__repr__ when users try to view their lazily decoded array.
        values = indexing.ImplicitToExplicitIndexingAdapter(indexing.as_indexable(data))
        example_value = np.concatenate(
            [first_n_items(values, 1) or [0], last_item(values) or [0]]
        )
    
        try:
>           result = decode_cf_datetime(example_value, units, calendar, use_cftime)

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:180: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian', use_cftime = None

    def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None):
        """Given an array of numeric dates in netCDF format, convert it into a
        numpy array of date time objects.
    
        For standard (Gregorian) calendars, this function uses vectorized
        operations, which makes it much faster than cftime.num2date. In such a
        case, the returned array will be of type np.datetime64.
    
        Note that time unit in `units` must not be smaller than microseconds and
        not larger than days.
    
        See Also
        --------
        cftime.num2date
        """
        num_dates = np.asarray(num_dates)
        flat_num_dates = num_dates.ravel()
        if calendar is None:
            calendar = "standard"
    
        if use_cftime is None:
            try:
                dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
            except (KeyError, OutOfBoundsDatetime, OutOfBoundsTimedelta, OverflowError):
>               dates = _decode_datetime_with_cftime(
                    flat_num_dates.astype(float), units, calendar
                )

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:272: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian'

    def _decode_datetime_with_cftime(num_dates, units, calendar):
        if cftime is None:
>           raise ModuleNotFoundError("No module named 'cftime'")
E           ModuleNotFoundError: No module named 'cftime'

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:199: ModuleNotFoundError

During handling of the above exception, another exception occurred:

self = <xarray.tests.test_backends.TestScipyInMemoryData object at 0x4010bfceb0>

    @arm_xfail
    def test_roundtrip_numpy_datetime_data(self):
        times = pd.to_datetime(["2000-01-01", "2000-01-02", "NaT"])
        expected = Dataset({"t": ("t", times), "t0": times[0]})
        kwargs = {"encoding": {"t0": {"units": "days since 1950-01-01"}}}
>       with self.roundtrip(expected, save_kwargs=kwargs) as actual:

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:510: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.10/contextlib.py:135: in __enter__
    return next(self.gen)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:286: in roundtrip
    with self.open(path, **open_kwargs) as ds:
/usr/lib64/python3.10/contextlib.py:135: in __enter__
    return next(self.gen)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:312: in open
    with open_dataset(path, engine=self.engine, **kwargs) as ds:
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/api.py:531: in open_dataset
    backend_ds = backend.open_dataset(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/scipy_.py:285: in open_dataset
    ds = store_entrypoint.open_dataset(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/store.py:29: in open_dataset
    vars, attrs, coord_names = conventions.decode_cf_variables(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/conventions.py:521: in decode_cf_variables
    new_vars[k] = decode_cf_variable(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/conventions.py:369: in decode_cf_variable
    var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:682: in decode
    dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = <xarray.backends.scipy_.ScipyArrayWrapper object at 0x40238999c0>
units = 'days since 2000-01-01 00:00:00', calendar = 'proleptic_gregorian'
use_cftime = None

    def _decode_cf_datetime_dtype(data, units, calendar, use_cftime):
        # Verify that at least the first and last date can be decoded
        # successfully. Otherwise, tracebacks end up swallowed by
        # Dataset.__repr__ when users try to view their lazily decoded array.
        values = indexing.ImplicitToExplicitIndexingAdapter(indexing.as_indexable(data))
        example_value = np.concatenate(
            [first_n_items(values, 1) or [0], last_item(values) or [0]]
        )
    
        try:
            result = decode_cf_datetime(example_value, units, calendar, use_cftime)
        except Exception:
            calendar_msg = (
                "the default calendar" if calendar is None else f"calendar {calendar!r}"
            )
            msg = (
                f"unable to decode time units {units!r} with {calendar_msg!r}. Try "
                "opening your dataset with decode_times=False or installing cftime "
                "if it is not installed."
            )
>           raise ValueError(msg)
E           ValueError: unable to decode time units 'days since 2000-01-01 00:00:00' with "calendar 'proleptic_gregorian'". Try opening your dataset with decode_times=False or installing cftime if it is not installed.

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:190: ValueError

Anything else we need to know?

https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/python-xarray/standard/riscv64

import xarray as xr
import numpy as np
import pandas as pd
num_dates = np.asarray([0., np.nan])
flat_num_dates = num_dates.ravel()
flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64)
flat_num_dates_ns_int
array([ 0, 9223372036854775807])
pd.to_timedelta(flat_num_dates_ns_int, "ns")
TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
pd.to_timedelta(flat_num_dates, "ns")
TimedeltaIndex(['0 days', NaT], dtype='timedelta64[ns]', freq=None)

Environment

/usr/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.10.7 (main, Sep 11 2022, 08:41:56) [GCC]
python-bits: 64
OS: Linux
OS-release: 5.19.10-1-default
machine: riscv64
processor: riscv64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2022.6.0
pandas: 1.4.4
numpy: 1.21.6
scipy: 1.8.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: None
distributed: None
matplotlib: 3.5.3
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.2.0
pip: 22.0.4
conda: None
pytest: 7.1.2
IPython: 8.5.0
sphinx: None

max-sixty · 2022-09-28T17:34:03Z

It looks lie many of these occur in pandas code — do pandas tests pass?

andreas-schwab · 2022-09-28T18:31:14Z

On Sep 28 2022, Maximilian Roos wrote: It looks lie many of these occur in pandas code — do pandas tests pass?

That's because xarray is passing bogus values.

max-sixty · 2022-09-28T20:07:57Z

What are the bogus values?

Please could you answer the question on whether pandas tests pass?

andreas-schwab · 2022-09-28T20:35:34Z

array([ 0, 9223372036854775807])

max-sixty · 2022-09-28T21:26:09Z

I'm not sure what that has to do with xarray though? Does this give the same result?

import numpy as np
import pandas as pd
num_dates = np.asarray([0., np.nan])
flat_num_dates = num_dates.ravel()
flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64)
flat_num_dates_ns_int
array([ 0, 9223372036854775807])

Please could you answer the question on whether pandas tests pass?

We're here helping as volunteers; we can only engage on issues if you reciprocate our good faith. Please could you answer this?

max-sixty · 2022-10-02T02:47:22Z

Closing but please feel free to reopen

felixonmars · 2022-10-02T13:23:06Z

Hi, we are getting similar failures when building xarray for Arch Linux riscv64.

I'm not sure what that has to do with xarray though? Does this give the same result?

import numpy as np
import pandas as pd
num_dates = np.asarray([0., np.nan])
flat_num_dates = num_dates.ravel()
flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64)
flat_num_dates_ns_int
array([ 0, 9223372036854775807])

I got the same result in riscv64. One thing I could guess is that the sign bit of NaN is not kept during conversions. Some more details could be found at: https://sourceware.org/pipermail/libc-alpha/2022-September/142011.html

Repeating the same steps result in array([0, -9223372036854775808]) in x86_64 and array([0, 0]) in aarch64.

Please could you answer the question on whether pandas tests pass?

I have tried pandas' tests and got many failures like:

E       AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different                                                                                                                                                                                      
E                                                                                                                                                                                                                                                                                  
E       Attribute "dtype" are different                                                                                                                                                                                                                                            
E       [left]:  float64                                                                                                                                                                                                                                                           
E       [right]: datetime64[ns]

or

E           AssertionError: numpy array are different               
E                                                                   
E           numpy array values are different (50.0 %)               
E           [index]: [0, 1]                                         
E           [left]:  [1036713600000, -9223372036854775808]          
E           [right]: [1036713600000000000, -9223372036854775808]

Quite some of the tests are having NaN in the context as well. So you are probably right that pandas or numpy may be where the problem lies.

max-sixty · 2022-10-02T17:27:33Z

I got the same result in riscv64. One thing I could guess is that the sign bit of NaN is not kept during conversions. Some more details could be found at

Thanks for trying that. Notably, that code doesn't have xarray in. So I'm keen to be part of the solution, but it doesn't look to be a problem with xarray code specifically. Let me know if that makes sense.

dcherian · 2022-10-03T16:22:19Z

As in #7098

I think the real solution here is to explicitly handle NaNs during the decoding step. We do want these to be NaT in the output.

kmuehlbauer · 2023-09-12T16:39:52Z

@felixonmars If you are still in the works with this, I'd appreciate if you could test this against #7827. Thanks.

felixonmars · 2023-09-12T17:13:14Z

@kmuehlbauer Sure. I have verified that the tests are passing on #7827 and failing on the current main branch.

andreas-schwab added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 28, 2022

max-sixty added needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports and removed bug needs triage Issue that has not been reviewed by xarray team member labels Sep 29, 2022

max-sixty closed this as completed Oct 2, 2022

dcherian reopened this Oct 3, 2022

dcherian changed the title ~~Testsuite failures on riscv64~~ Handle NaNs when decoding times (failures on riscv64) Oct 3, 2022

dcherian added bug and removed needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports labels Oct 3, 2022

kmuehlbauer mentioned this issue Sep 12, 2023

Preserve nanosecond resolution when encoding/decoding times #7827

Merged

9 tasks

kmuehlbauer closed this as completed in #7827 Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle NaNs when decoding times (failures on riscv64) #7096

Handle NaNs when decoding times (failures on riscv64) #7096

andreas-schwab commented Sep 28, 2022

INSTALLED VERSIONS

max-sixty commented Sep 28, 2022

andreas-schwab commented Sep 28, 2022 via email

max-sixty commented Sep 28, 2022

andreas-schwab commented Sep 28, 2022 via email

max-sixty commented Sep 28, 2022

max-sixty commented Oct 2, 2022

felixonmars commented Oct 2, 2022

max-sixty commented Oct 2, 2022

dcherian commented Oct 3, 2022

kmuehlbauer commented Sep 12, 2023

felixonmars commented Sep 12, 2023

Handle NaNs when decoding times (failures on riscv64) #7096

Handle NaNs when decoding times (failures on riscv64) #7096

Comments

andreas-schwab commented Sep 28, 2022

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

max-sixty commented Sep 28, 2022

andreas-schwab commented Sep 28, 2022 via email

max-sixty commented Sep 28, 2022

andreas-schwab commented Sep 28, 2022 via email

max-sixty commented Sep 28, 2022

max-sixty commented Oct 2, 2022

felixonmars commented Oct 2, 2022

max-sixty commented Oct 2, 2022

dcherian commented Oct 3, 2022

kmuehlbauer commented Sep 12, 2023

felixonmars commented Sep 12, 2023