Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle NaNs when decoding times (failures on riscv64) #7096

Closed
4 tasks done
andreas-schwab opened this issue Sep 28, 2022 · 11 comments · Fixed by #7827
Closed
4 tasks done

Handle NaNs when decoding times (failures on riscv64) #7096

andreas-schwab opened this issue Sep 28, 2022 · 11 comments · Fixed by #7827
Labels

Comments

@andreas-schwab
Copy link

What happened?

FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_roundtrip_numpy_datetime_data
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_numpy_datetime_data
FAILED xarray/tests/test_backends.py::TestGenericNetCDFData::test_roundtrip_numpy_datetime_data
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_roundtrip_numpy_datetime_data
= 4 failed, 4636 passed, 5632 skipped, 19 xfailed, 22 xpassed, 38 warnings in 266.18s (0:04:26) =

What did you expect to happen?

No failures

Minimal Complete Verifiable Example

pytest-3.10 -n auto /usr/lib/python3.10/site-packages/xarray

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

=================================== FAILURES ===================================
___________ TestScipyInMemoryData.test_roundtrip_numpy_datetime_data ___________
[gw2] linux -- Python 3.10.7 /usr/bin/python3.10

num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian', use_cftime = None

    def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None):
        """Given an array of numeric dates in netCDF format, convert it into a
        numpy array of date time objects.
    
        For standard (Gregorian) calendars, this function uses vectorized
        operations, which makes it much faster than cftime.num2date. In such a
        case, the returned array will be of type np.datetime64.
    
        Note that time unit in `units` must not be smaller than microseconds and
        not larger than days.
    
        See Also
        --------
        cftime.num2date
        """
        num_dates = np.asarray(num_dates)
        flat_num_dates = num_dates.ravel()
        if calendar is None:
            calendar = "standard"
    
        if use_cftime is None:
            try:
>               dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:270: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

flat_num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian'

    def _decode_datetime_with_pandas(flat_num_dates, units, calendar):
        if not _is_standard_calendar(calendar):
            raise OutOfBoundsDatetime(
                "Cannot decode times from a non-standard calendar, {!r}, using "
                "pandas.".format(calendar)
            )
    
        delta, ref_date = _unpack_netcdf_time_units(units)
        delta = _netcdf_to_numpy_timeunit(delta)
        try:
            ref_date = pd.Timestamp(ref_date)
        except ValueError:
            # ValueError is raised by pd.Timestamp for non-ISO timestamp
            # strings, in which case we fall back to using cftime
            raise OutOfBoundsDatetime
    
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", "invalid value encountered", RuntimeWarning)
            pd.to_timedelta(flat_num_dates.min(), delta) + ref_date
            pd.to_timedelta(flat_num_dates.max(), delta) + ref_date
    
        # To avoid integer overflow when converting to nanosecond units for integer
        # dtypes smaller than np.int64 cast all integer and unsigned integer dtype
        # arrays to np.int64 (GH 2002, GH 6589).  Note this is safe even in the case
        # of np.uint64 values, because any np.uint64 value that would lead to
        # overflow when converting to np.int64 would not be representable with a
        # timedelta64 value, and therefore would raise an error in the lines above.
        if flat_num_dates.dtype.kind in "iu":
            flat_num_dates = flat_num_dates.astype(np.int64)
    
        # Cast input ordinals to integers of nanoseconds because pd.to_timedelta
        # works much faster when dealing with integers (GH 1399).
        flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
            np.int64
        )
    
        # Use pd.to_timedelta to safely cast integer values to timedeltas,
        # and add those to a Timestamp to safely produce a DatetimeIndex.  This
        # ensures that we do not encounter integer overflow at any point in the
        # process without raising OutOfBoundsDatetime.
>       return (pd.to_timedelta(flat_num_dates_ns_int, "ns") + ref_date).values

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:245: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00')

    @wraps(method)
    def new_method(self, other):
    
        if is_cmp and isinstance(self, ABCIndex) and isinstance(other, ABCSeries):
            # For comparison ops, Index does *not* defer to Series
            pass
        else:
            for cls in [ABCDataFrame, ABCSeries, ABCIndex]:
                if isinstance(self, cls):
                    break
                if isinstance(other, cls):
                    return NotImplemented
    
        other = item_from_zerodim(other)
    
>       return method(self, other)

/usr/lib64/python3.10/site-packages/pandas/core/ops/common.py:70: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00')

    @unpack_zerodim_and_defer("__add__")
    def __add__(self, other):
>       return self._arith_method(other, operator.add)

/usr/lib64/python3.10/site-packages/pandas/core/arraylike.py:100: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>

    def _arith_method(self, other, op):
        if (
            isinstance(other, Index)
            and is_object_dtype(other.dtype)
            and type(other) is not Index
        ):
            # We return NotImplemented for object-dtype index *subclasses* so they have
            # a chance to implement ops before we unwrap them.
            # See https://github.com/pandas-dev/pandas/issues/31109
            return NotImplemented
    
>       return super()._arith_method(other, op)

/usr/lib64/python3.10/site-packages/pandas/core/indexes/base.py:6734: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>

    def _arith_method(self, other, op):
        res_name = ops.get_op_result_name(self, other)
    
        lvalues = self._values
        rvalues = extract_array(other, extract_numpy=True, extract_range=True)
        rvalues = ops.maybe_prepare_scalar_for_op(rvalues, lvalues.shape)
        rvalues = ensure_wrapped_if_datetimelike(rvalues)
    
        with np.errstate(all="ignore"):
>           result = ops.arithmetic_op(lvalues, rvalues, op)

/usr/lib64/python3.10/site-packages/pandas/core/base.py:1295: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

left = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
right = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>

    def arithmetic_op(left: ArrayLike, right: Any, op):
        """
        Evaluate an arithmetic operation `+`, `-`, `*`, `/`, `//`, `%`, `**`, ...
    
        Note: the caller is responsible for ensuring that numpy warnings are
        suppressed (with np.errstate(all="ignore")) if needed.
    
        Parameters
        ----------
        left : np.ndarray or ExtensionArray
        right : object
            Cannot be a DataFrame or Index.  Series is *not* excluded.
        op : {operator.add, operator.sub, ...}
            Or one of the reversed variants from roperator.
    
        Returns
        -------
        ndarray or ExtensionArray
            Or a 2-tuple of these in the case of divmod or rdivmod.
        """
        # NB: We assume that extract_array and ensure_wrapped_if_datetimelike
        #  have already been called on `left` and `right`,
        #  and `maybe_prepare_scalar_for_op` has already been called on `right`
        # We need to special-case datetime64/timedelta64 dtypes (e.g. because numpy
        # casts integer dtypes to timedelta64 when operating with timedelta64 - GH#22390)
    
        if (
            should_extension_dispatch(left, right)
            or isinstance(right, (Timedelta, BaseOffset, Timestamp))
            or right is NaT
        ):
            # Timedelta/Timestamp and other custom scalars are included in the check
            # because numexpr will fail on it, see GH#31457
>           res_values = op(left, right)

/usr/lib64/python3.10/site-packages/pandas/core/ops/array_ops.py:216: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')

    @wraps(method)
    def new_method(self, other):
    
        if is_cmp and isinstance(self, ABCIndex) and isinstance(other, ABCSeries):
            # For comparison ops, Index does *not* defer to Series
            pass
        else:
            for cls in [ABCDataFrame, ABCSeries, ABCIndex]:
                if isinstance(self, cls):
                    break
                if isinstance(other, cls):
                    return NotImplemented
    
        other = item_from_zerodim(other)
    
>       return method(self, other)

/usr/lib64/python3.10/site-packages/pandas/core/ops/common.py:70: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')

    @unpack_zerodim_and_defer("__add__")
    def __add__(self, other):
        other_dtype = getattr(other, "dtype", None)
    
        # scalar others
        if other is NaT:
            result = self._add_nat()
        elif isinstance(other, (Tick, timedelta, np.timedelta64)):
            result = self._add_timedeltalike_scalar(other)
        elif isinstance(other, BaseOffset):
            # specifically _not_ a Tick
            result = self._add_offset(other)
        elif isinstance(other, (datetime, np.datetime64)):
>           result = self._add_datetimelike_scalar(other)

/usr/lib64/python3.10/site-packages/pandas/core/arrays/datetimelike.py:1264: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')

    def _add_datetimelike_scalar(self, other) -> DatetimeArray:
        # adding a timedeltaindex to a datetimelike
        from pandas.core.arrays import DatetimeArray
    
        assert other is not NaT
        other = Timestamp(other)
        if other is NaT:
            # In this case we specifically interpret NaT as a datetime, not
            # the timedelta interpretation we would get by returning self + NaT
            result = self.asi8.view("m8[ms]") + NaT.to_datetime64()
            return DatetimeArray(result)
    
        i8 = self.asi8
>       result = checked_add_with_arr(i8, other.value, arr_mask=self._isnan)

/usr/lib64/python3.10/site-packages/pandas/core/arrays/timedeltas.py:482: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = array([                  0, 9223372036854775807]), b = 946684800000000000
arr_mask = array([False, False]), b_mask = None

    def checked_add_with_arr(
        arr: np.ndarray,
        b,
        arr_mask: npt.NDArray[np.bool_] | None = None,
        b_mask: npt.NDArray[np.bool_] | None = None,
    ) -> np.ndarray:
        """
        Perform array addition that checks for underflow and overflow.
    
        Performs the addition of an int64 array and an int64 integer (or array)
        but checks that they do not result in overflow first. For elements that
        are indicated to be NaN, whether or not there is overflow for that element
        is automatically ignored.
    
        Parameters
        ----------
        arr : array addend.
        b : array or scalar addend.
        arr_mask : np.ndarray[bool] or None, default None
            array indicating which elements to exclude from checking
        b_mask : np.ndarray[bool] or None, default None
            array or scalar indicating which element(s) to exclude from checking
    
        Returns
        -------
        sum : An array for elements x + b for each element x in arr if b is
              a scalar or an array for elements x + y for each element pair
              (x, y) in (arr, b).
    
        Raises
        ------
        OverflowError if any x + y exceeds the maximum or minimum int64 value.
        """
        # For performance reasons, we broadcast 'b' to the new array 'b2'
        # so that it has the same size as 'arr'.
        b2 = np.broadcast_to(b, arr.shape)
        if b_mask is not None:
            # We do the same broadcasting for b_mask as well.
            b2_mask = np.broadcast_to(b_mask, arr.shape)
        else:
            b2_mask = None
    
        # For elements that are NaN, regardless of their value, we should
        # ignore whether they overflow or not when doing the checked add.
        if arr_mask is not None and b2_mask is not None:
            not_nan = np.logical_not(arr_mask | b2_mask)
        elif arr_mask is not None:
            not_nan = np.logical_not(arr_mask)
        elif b_mask is not None:
            # Argument 1 to "__call__" of "_UFunc_Nin1_Nout1" has incompatible type
            # "Optional[ndarray[Any, dtype[bool_]]]"; expected
            # "Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[An
            # y]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool,
            # int, float, complex, str, bytes]]]"  [arg-type]
            not_nan = np.logical_not(b2_mask)  # type: ignore[arg-type]
        else:
            not_nan = np.empty(arr.shape, dtype=bool)
            not_nan.fill(True)
    
        # gh-14324: For each element in 'arr' and its corresponding element
        # in 'b2', we check the sign of the element in 'b2'. If it is positive,
        # we then check whether its sum with the element in 'arr' exceeds
        # np.iinfo(np.int64).max. If so, we have an overflow error. If it
        # it is negative, we then check whether its sum with the element in
        # 'arr' exceeds np.iinfo(np.int64).min. If so, we have an overflow
        # error as well.
        i8max = lib.i8max
        i8min = iNaT
    
        mask1 = b2 > 0
        mask2 = b2 < 0
    
        if not mask1.any():
            to_raise = ((i8min - b2 > arr) & not_nan).any()
        elif not mask2.any():
            to_raise = ((i8max - b2 < arr) & not_nan).any()
        else:
            to_raise = ((i8max - b2[mask1] < arr[mask1]) & not_nan[mask1]).any() or (
                (i8min - b2[mask2] > arr[mask2]) & not_nan[mask2]
            ).any()
    
        if to_raise:
>           raise OverflowError("Overflow in int64 addition")
E           OverflowError: Overflow in int64 addition

/usr/lib64/python3.10/site-packages/pandas/core/algorithms.py:1114: OverflowError

During handling of the above exception, another exception occurred:

data = <xarray.backends.scipy_.ScipyArrayWrapper object at 0x40238999c0>
units = 'days since 2000-01-01 00:00:00', calendar = 'proleptic_gregorian'
use_cftime = None

    def _decode_cf_datetime_dtype(data, units, calendar, use_cftime):
        # Verify that at least the first and last date can be decoded
        # successfully. Otherwise, tracebacks end up swallowed by
        # Dataset.__repr__ when users try to view their lazily decoded array.
        values = indexing.ImplicitToExplicitIndexingAdapter(indexing.as_indexable(data))
        example_value = np.concatenate(
            [first_n_items(values, 1) or [0], last_item(values) or [0]]
        )
    
        try:
>           result = decode_cf_datetime(example_value, units, calendar, use_cftime)

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:180: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian', use_cftime = None

    def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None):
        """Given an array of numeric dates in netCDF format, convert it into a
        numpy array of date time objects.
    
        For standard (Gregorian) calendars, this function uses vectorized
        operations, which makes it much faster than cftime.num2date. In such a
        case, the returned array will be of type np.datetime64.
    
        Note that time unit in `units` must not be smaller than microseconds and
        not larger than days.
    
        See Also
        --------
        cftime.num2date
        """
        num_dates = np.asarray(num_dates)
        flat_num_dates = num_dates.ravel()
        if calendar is None:
            calendar = "standard"
    
        if use_cftime is None:
            try:
                dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
            except (KeyError, OutOfBoundsDatetime, OutOfBoundsTimedelta, OverflowError):
>               dates = _decode_datetime_with_cftime(
                    flat_num_dates.astype(float), units, calendar
                )

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:272: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian'

    def _decode_datetime_with_cftime(num_dates, units, calendar):
        if cftime is None:
>           raise ModuleNotFoundError("No module named 'cftime'")
E           ModuleNotFoundError: No module named 'cftime'

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:199: ModuleNotFoundError

During handling of the above exception, another exception occurred:

self = <xarray.tests.test_backends.TestScipyInMemoryData object at 0x4010bfceb0>

    @arm_xfail
    def test_roundtrip_numpy_datetime_data(self):
        times = pd.to_datetime(["2000-01-01", "2000-01-02", "NaT"])
        expected = Dataset({"t": ("t", times), "t0": times[0]})
        kwargs = {"encoding": {"t0": {"units": "days since 1950-01-01"}}}
>       with self.roundtrip(expected, save_kwargs=kwargs) as actual:

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:510: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.10/contextlib.py:135: in __enter__
    return next(self.gen)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:286: in roundtrip
    with self.open(path, **open_kwargs) as ds:
/usr/lib64/python3.10/contextlib.py:135: in __enter__
    return next(self.gen)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:312: in open
    with open_dataset(path, engine=self.engine, **kwargs) as ds:
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/api.py:531: in open_dataset
    backend_ds = backend.open_dataset(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/scipy_.py:285: in open_dataset
    ds = store_entrypoint.open_dataset(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/store.py:29: in open_dataset
    vars, attrs, coord_names = conventions.decode_cf_variables(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/conventions.py:521: in decode_cf_variables
    new_vars[k] = decode_cf_variable(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/conventions.py:369: in decode_cf_variable
    var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:682: in decode
    dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = <xarray.backends.scipy_.ScipyArrayWrapper object at 0x40238999c0>
units = 'days since 2000-01-01 00:00:00', calendar = 'proleptic_gregorian'
use_cftime = None

    def _decode_cf_datetime_dtype(data, units, calendar, use_cftime):
        # Verify that at least the first and last date can be decoded
        # successfully. Otherwise, tracebacks end up swallowed by
        # Dataset.__repr__ when users try to view their lazily decoded array.
        values = indexing.ImplicitToExplicitIndexingAdapter(indexing.as_indexable(data))
        example_value = np.concatenate(
            [first_n_items(values, 1) or [0], last_item(values) or [0]]
        )
    
        try:
            result = decode_cf_datetime(example_value, units, calendar, use_cftime)
        except Exception:
            calendar_msg = (
                "the default calendar" if calendar is None else f"calendar {calendar!r}"
            )
            msg = (
                f"unable to decode time units {units!r} with {calendar_msg!r}. Try "
                "opening your dataset with decode_times=False or installing cftime "
                "if it is not installed."
            )
>           raise ValueError(msg)
E           ValueError: unable to decode time units 'days since 2000-01-01 00:00:00' with "calendar 'proleptic_gregorian'". Try opening your dataset with decode_times=False or installing cftime if it is not installed.

/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:190: ValueError

Anything else we need to know?

https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/python-xarray/standard/riscv64

import xarray as xr
import numpy as np
import pandas as pd
num_dates = np.asarray([0., np.nan])
flat_num_dates = num_dates.ravel()
flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64)
flat_num_dates_ns_int
array([ 0, 9223372036854775807])
pd.to_timedelta(flat_num_dates_ns_int, "ns")
TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
pd.to_timedelta(flat_num_dates, "ns")
TimedeltaIndex(['0 days', NaT], dtype='timedelta64[ns]', freq=None)

Environment

/usr/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.10.7 (main, Sep 11 2022, 08:41:56) [GCC]
python-bits: 64
OS: Linux
OS-release: 5.19.10-1-default
machine: riscv64
processor: riscv64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2022.6.0
pandas: 1.4.4
numpy: 1.21.6
scipy: 1.8.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: None
distributed: None
matplotlib: 3.5.3
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.2.0
pip: 22.0.4
conda: None
pytest: 7.1.2
IPython: 8.5.0
sphinx: None

@andreas-schwab andreas-schwab added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 28, 2022
@max-sixty
Copy link
Collaborator

It looks lie many of these occur in pandas code — do pandas tests pass?

@andreas-schwab
Copy link
Author

andreas-schwab commented Sep 28, 2022 via email

@max-sixty
Copy link
Collaborator

What are the bogus values?

Please could you answer the question on whether pandas tests pass?

@andreas-schwab
Copy link
Author

andreas-schwab commented Sep 28, 2022 via email

@max-sixty
Copy link
Collaborator

I'm not sure what that has to do with xarray though? Does this give the same result?

import numpy as np
import pandas as pd
num_dates = np.asarray([0., np.nan])
flat_num_dates = num_dates.ravel()
flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64)
flat_num_dates_ns_int
array([ 0, 9223372036854775807])

Please could you answer the question on whether pandas tests pass?

We're here helping as volunteers; we can only engage on issues if you reciprocate our good faith. Please could you answer this?

@max-sixty max-sixty added needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports and removed bug needs triage Issue that has not been reviewed by xarray team member labels Sep 29, 2022
@max-sixty
Copy link
Collaborator

Closing but please feel free to reopen

@felixonmars
Copy link

Hi, we are getting similar failures when building xarray for Arch Linux riscv64.

I'm not sure what that has to do with xarray though? Does this give the same result?

import numpy as np
import pandas as pd
num_dates = np.asarray([0., np.nan])
flat_num_dates = num_dates.ravel()
flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64)
flat_num_dates_ns_int
array([ 0, 9223372036854775807])

I got the same result in riscv64. One thing I could guess is that the sign bit of NaN is not kept during conversions. Some more details could be found at: https://sourceware.org/pipermail/libc-alpha/2022-September/142011.html

Repeating the same steps result in array([0, -9223372036854775808]) in x86_64 and array([0, 0]) in aarch64.

Please could you answer the question on whether pandas tests pass?

I have tried pandas' tests and got many failures like:

E       AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different                                                                                                                                                                                      
E                                                                                                                                                                                                                                                                                  
E       Attribute "dtype" are different                                                                                                                                                                                                                                            
E       [left]:  float64                                                                                                                                                                                                                                                           
E       [right]: datetime64[ns]    

or

E           AssertionError: numpy array are different               
E                                                                   
E           numpy array values are different (50.0 %)               
E           [index]: [0, 1]                                         
E           [left]:  [1036713600000, -9223372036854775808]          
E           [right]: [1036713600000000000, -9223372036854775808]    

Quite some of the tests are having NaN in the context as well. So you are probably right that pandas or numpy may be where the problem lies.

@max-sixty
Copy link
Collaborator

I got the same result in riscv64. One thing I could guess is that the sign bit of NaN is not kept during conversions. Some more details could be found at

Thanks for trying that. Notably, that code doesn't have xarray in. So I'm keen to be part of the solution, but it doesn't look to be a problem with xarray code specifically. Let me know if that makes sense.

@dcherian
Copy link
Contributor

dcherian commented Oct 3, 2022

As in #7098

I think the real solution here is to explicitly handle NaNs during the decoding step. We do want these to be NaT in the output.

@dcherian dcherian reopened this Oct 3, 2022
@dcherian dcherian changed the title Testsuite failures on riscv64 Handle NaNs when decoding times (failures on riscv64) Oct 3, 2022
@dcherian dcherian added bug and removed needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports labels Oct 3, 2022
@kmuehlbauer
Copy link
Contributor

@felixonmars If you are still in the works with this, I'd appreciate if you could test this against #7827. Thanks.

@felixonmars
Copy link

@kmuehlbauer Sure. I have verified that the tests are passing on #7827 and failing on the current main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants