Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinates passed to interp have nan values #3924

Closed
wants to merge 1 commit into from

Conversation

zxdawn
Copy link

@zxdawn zxdawn commented Apr 1, 2020

Problem

Keyerror when the coordinates passed to interp have nan value.

Example

import numpy as np
import xarray as xr

da = xr.DataArray(np.sin(0.3 * np.arange(20).reshape(5, 4)),
                  [('x', np.arange(5)),
                   ('y', [0.1, 0.2, 0.3, 0.4])])

x = xr.DataArray([[0.5, np.nan], [1.5, 2.5]], dims=['z1', 'z2'])
y = xr.DataArray([[0.15, 0.2], [np.nan, 0.35]], dims=['z1', 'z2'])
da_nan = da.interp(x=x, y=y)

Output

E:\miniconda3\envs\satpy\lib\site-packages\pandas\core\indexes\base.py:2826: RuntimeWarning: invalid value encountered in less
  op(left_distances, right_distances) | (right_indexer == -1),
Traceback (most recent call last):
  File "C:\Users\Xin\Desktop\test_none.py", line 10, in <module>
    da_nan = da.interp(x=x, y=y)
  File "E:\miniconda3\envs\satpy\lib\site-packages\xarray\core\dataarray.py", line 1365, in interp
    **coords_kwargs,
  File "E:\miniconda3\envs\satpy\lib\site-packages\xarray\core\dataset.py", line 2610, in interp
    variables[name] = missing.interp(var, var_indexers, method, **kwargs)
  File "E:\miniconda3\envs\satpy\lib\site-packages\xarray\core\missing.py", line 611, in interp
    var, indexes_coords = _localize(var, indexes_coords)
  File "E:\miniconda3\envs\satpy\lib\site-packages\xarray\core\missing.py", line 552, in _localize
    imin = index.get_loc(np.min(new_x.values), method="nearest")
  File "E:\miniconda3\envs\satpy\lib\site-packages\pandas\core\indexes\base.py", line 2654, in get_loc
    raise KeyError(key)
KeyError: nan

Solution

Use np.nanmin and np.nanmax in index.get_loc()

Test

Code

import numpy as np
import xarray as xr

da = xr.DataArray(np.sin(0.3 * np.arange(20).reshape(5, 4)),
                  [('x', np.arange(5)),
                   ('y', [0.1, 0.2, 0.3, 0.4])])

x = xr.DataArray([[0.5, np.nan], [1.5, 2.5]], dims=['z1', 'z2'])
y = xr.DataArray([[0.15, 0.2], [np.nan, 0.35]], dims=['z1', 'z2'])
da_nan = da.interp(x=x, y=y)

x = xr.DataArray([[0.5, 1], [1.5, 2.5]], dims=['z1', 'z2'])
y = xr.DataArray([[0.15, 0.2], [0.25, 0.35]], dims=['z1', 'z2'])
da_nonan = da.interp(x=x, y=y)

print('da_nan \n', da_nan)
print('da_nonan \n', da_nonan)

Output

E:\miniconda3\envs\satpy\lib\site-packages\scipy\interpolate\interpolate.py:2539: RuntimeWarning: invalid value encountered in less
  out_of_bounds += x < grid[0]
E:\miniconda3\envs\satpy\lib\site-packages\scipy\interpolate\interpolate.py:2540: RuntimeWarning: invalid value encountered in greater
  out_of_bounds += x > grid[-1]
da_nan 
 <xarray.DataArray (z1: 2, z2: 2)>
array([[ 0.55626357,         nan],
       [        nan, -0.46643289]])
Coordinates:
    x        (z1, z2) float64 0.5 nan 1.5 2.5
    y        (z1, z2) float64 0.15 0.2 nan 0.35
Dimensions without coordinates: z1, z2
da_nonan 
 <xarray.DataArray (z1: 2, z2: 2)>
array([[ 0.55626357,  0.99749499],
       [ 0.63496063, -0.46643289]])
Coordinates:
    x        (z1, z2) float64 0.5 1.0 1.5 2.5
    y        (z1, z2) float64 0.15 0.2 0.25 0.35
Dimensions without coordinates: z1, z2

@zxdawn zxdawn changed the title Deal with nan in new coordinate Coordinates passed to interp have nan values Apr 1, 2020
@max-sixty
Copy link
Collaborator

Hi @zxdawn thanks for the PR, appreciate you making a first contribution.

I don't know this code that well, but your examples look reasonable. Any thoughts from those that know this better, @spencerkclark @huard @dcherian ?

If we do go ahead and merge this solution, we'd need tests @zxdawn , would you be up for writing those?

@zxdawn
Copy link
Author

zxdawn commented Apr 2, 2020

Hi @max-sixty, thanks.
If this looks well, I'm glad to add the test for it.

@spencerkclark
Copy link
Member

Indeed, thanks @zxdawn, this seems like a reasonable change to me too. I say go ahead with writing the tests.

@spencerkclark
Copy link
Member

Note that it looks like this causes problems for older NumPy versions when interpolating with a datetime coordinate, e.g. in the Linux py36-min-nep18 build:

2020-04-01T05:53:31.3406894Z ________________ TestDataArray.test_resample_drop_nondim_coords ________________
2020-04-01T05:53:31.3408485Z 
2020-04-01T05:53:31.3409839Z self = <xarray.tests.test_dataarray.TestDataArray object at 0x7f360ed41080>
2020-04-01T05:53:31.3411145Z 
2020-04-01T05:53:31.3413416Z     @requires_scipy
2020-04-01T05:53:31.3414202Z     def test_resample_drop_nondim_coords(self):
2020-04-01T05:53:31.3415025Z         xs = np.arange(6)
2020-04-01T05:53:31.3415870Z         ys = np.arange(3)
2020-04-01T05:53:31.3417272Z         times = pd.date_range("2000-01-01", freq="6H", periods=5)
2020-04-01T05:53:31.3418156Z         data = np.tile(np.arange(5), (6, 3, 1))
2020-04-01T05:53:31.3419186Z         xx, yy = np.meshgrid(xs * 5, ys * 2.5)
2020-04-01T05:53:31.3420539Z         tt = np.arange(len(times), dtype=int)
2020-04-01T05:53:31.3421488Z         array = DataArray(data, {"time": times, "x": xs, "y": ys}, ("x", "y", "time"))
2020-04-01T05:53:31.3422568Z         xcoord = DataArray(xx.T, {"x": xs, "y": ys}, ("x", "y"))
2020-04-01T05:53:31.3423540Z         ycoord = DataArray(yy.T, {"x": xs, "y": ys}, ("x", "y"))
2020-04-01T05:53:31.3424848Z         tcoord = DataArray(tt, {"time": times}, ("time",))
2020-04-01T05:53:31.3426753Z         ds = Dataset({"data": array, "xc": xcoord, "yc": ycoord, "tc": tcoord})
2020-04-01T05:53:31.3427254Z         ds = ds.set_coords(["xc", "yc", "tc"])
2020-04-01T05:53:31.3428134Z     
2020-04-01T05:53:31.3428545Z         # Select the data now, with the auxiliary coordinates in place
2020-04-01T05:53:31.3429410Z         array = ds["data"]
2020-04-01T05:53:31.3430268Z     
2020-04-01T05:53:31.3431425Z         # Re-sample
2020-04-01T05:53:31.3432662Z         actual = array.resample(time="12H", restore_coord_dims=True).mean("time")
2020-04-01T05:53:31.3434512Z         assert "tc" not in actual.coords
2020-04-01T05:53:31.3434892Z     
2020-04-01T05:53:31.3435441Z         # Up-sample - filling
2020-04-01T05:53:31.3435901Z         actual = array.resample(time="1H", restore_coord_dims=True).ffill()
2020-04-01T05:53:31.3437046Z         assert "tc" not in actual.coords
2020-04-01T05:53:31.3437939Z     
2020-04-01T05:53:31.3438514Z         # Up-sample - interpolation
2020-04-01T05:53:31.3439498Z         actual = array.resample(time="1H", restore_coord_dims=True).interpolate(
2020-04-01T05:53:31.3439962Z >           "linear"
2020-04-01T05:53:31.3440256Z         )
2020-04-01T05:53:31.3440495Z 
2020-04-01T05:53:31.3441338Z xarray/tests/test_dataarray.py:2941: 
2020-04-01T05:53:31.3441811Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2020-04-01T05:53:31.3442181Z xarray/core/resample.py:142: in interpolate
2020-04-01T05:53:31.3442484Z     return self._interpolate(kind=kind)
2020-04-01T05:53:31.3442807Z xarray/core/resample.py:156: in _interpolate
2020-04-01T05:53:31.3443109Z     **{self._dim: self._full_index},
2020-04-01T05:53:31.3443425Z xarray/core/dataarray.py:1381: in interp
2020-04-01T05:53:31.3443714Z     **coords_kwargs,
2020-04-01T05:53:31.3444238Z xarray/core/dataset.py:2637: in interp
2020-04-01T05:53:31.3444591Z     variables[name] = missing.interp(var, var_indexers, method, **kwargs)
2020-04-01T05:53:31.3444968Z xarray/core/missing.py:611: in interp
2020-04-01T05:53:31.3445281Z     var, indexes_coords = _localize(var, indexes_coords)
2020-04-01T05:53:31.3445618Z xarray/core/missing.py:552: in _localize
2020-04-01T05:53:31.3445952Z     imin = index.get_loc(np.nanmin(new_x.values), method="nearest")
2020-04-01T05:53:31.3446293Z <__array_function__ internals>:6: in nanmin
2020-04-01T05:53:31.3446575Z     ???
2020-04-01T05:53:31.3446883Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2020-04-01T05:53:31.3447158Z 
2020-04-01T05:53:31.3448180Z a = array(['2000-01-01T00:00:00.000000000', '2000-01-01T01:00:00.000000000',
2020-04-01T05:53:31.3450286Z        '2000-01-01T02:00:00.000000000', '2000...T22:00:00.000000000', '2000-01-01T23:00:00.000000000',
2020-04-01T05:53:31.3451512Z        '2000-01-02T00:00:00.000000000'], dtype='datetime64[ns]')
2020-04-01T05:53:31.3452062Z axis = None, out = None, keepdims = <no value>
2020-04-01T05:53:31.3452523Z 
2020-04-01T05:53:31.3452893Z     @array_function_dispatch(_nanmin_dispatcher)
2020-04-01T05:53:31.3453666Z     def nanmin(a, axis=None, out=None, keepdims=np._NoValue):
2020-04-01T05:53:31.3453973Z         """
2020-04-01T05:53:31.3454324Z         Return minimum of an array or minimum along an axis, ignoring any NaNs.
2020-04-01T05:53:31.3455156Z         When all-NaN slices are encountered a ``RuntimeWarning`` is raised and
2020-04-01T05:53:31.3455561Z         Nan is returned for that slice.
2020-04-01T05:53:31.3455844Z     
2020-04-01T05:53:31.3456074Z         Parameters
2020-04-01T05:53:31.3456499Z         ----------
2020-04-01T05:53:31.3456799Z         a : array_like
2020-04-01T05:53:31.3457148Z             Array containing numbers whose minimum is desired. If `a` is not an
2020-04-01T05:53:31.3457706Z             array, a conversion is attempted.
2020-04-01T05:53:31.3458025Z         axis : {int, tuple of int, None}, optional
2020-04-01T05:53:31.3458400Z             Axis or axes along which the minimum is computed. The default is to compute
2020-04-01T05:53:31.3458791Z             the minimum of the flattened array.
2020-04-01T05:53:31.3459085Z         out : ndarray, optional
2020-04-01T05:53:31.3459438Z             Alternate output array in which to place the result.  The default
2020-04-01T05:53:31.3459821Z             is ``None``; if provided, it must have the same shape as the
2020-04-01T05:53:31.3460197Z             expected output, but the type will be cast if necessary.  See
2020-04-01T05:53:31.3460547Z             `doc.ufuncs` for details.
2020-04-01T05:53:31.3460784Z     
2020-04-01T05:53:31.3461033Z             .. versionadded:: 1.8.0
2020-04-01T05:53:31.3461366Z         keepdims : bool, optional
2020-04-01T05:53:31.3461697Z             If this is set to True, the axes which are reduced are left
2020-04-01T05:53:31.3462096Z             in the result as dimensions with size one. With this option,
2020-04-01T05:53:31.3462475Z             the result will broadcast correctly against the original `a`.
2020-04-01T05:53:31.3462764Z     
2020-04-01T05:53:31.3463111Z             If the value is anything but the default, then
2020-04-01T05:53:31.3463473Z             `keepdims` will be passed through to the `min` method
2020-04-01T05:53:31.3464040Z             of sub-classes of `ndarray`.  If the sub-classes methods
2020-04-01T05:53:31.3464479Z             does not implement `keepdims` any exceptions will be raised.
2020-04-01T05:53:31.3464772Z     
2020-04-01T05:53:31.3465021Z             .. versionadded:: 1.8.0
2020-04-01T05:53:31.3465267Z     
2020-04-01T05:53:31.3465501Z         Returns
2020-04-01T05:53:31.3465911Z         -------
2020-04-01T05:53:31.3466198Z         nanmin : ndarray
2020-04-01T05:53:31.3466530Z             An array with the same shape as `a`, with the specified axis
2020-04-01T05:53:31.3467115Z             removed.  If `a` is a 0-d array, or if axis is None, an ndarray
2020-04-01T05:53:31.3467548Z             scalar is returned.  The same dtype as `a` is returned.
2020-04-01T05:53:31.3467869Z     
2020-04-01T05:53:31.3468088Z         See Also
2020-04-01T05:53:31.3468465Z         --------
2020-04-01T05:53:31.3468732Z         nanmax :
2020-04-01T05:53:31.3469068Z             The maximum value of an array along a given axis, ignoring any NaNs.
2020-04-01T05:53:31.3469380Z         amin :
2020-04-01T05:53:31.3469713Z             The minimum value of an array along a given axis, propagating any NaNs.
2020-04-01T05:53:31.3470028Z         fmin :
2020-04-01T05:53:31.3470507Z             Element-wise minimum of two arrays, ignoring any NaNs.
2020-04-01T05:53:31.3470866Z         minimum :
2020-04-01T05:53:31.3471352Z             Element-wise minimum of two arrays, propagating any NaNs.
2020-04-01T05:53:31.3471708Z         isnan :
2020-04-01T05:53:31.3472115Z             Shows which elements are Not a Number (NaN).
2020-04-01T05:53:31.3472405Z         isfinite:
2020-04-01T05:53:31.3472702Z             Shows which elements are neither NaN nor infinity.
2020-04-01T05:53:31.3473089Z     
2020-04-01T05:53:31.3473323Z         amax, fmax, maximum
2020-04-01T05:53:31.3473572Z     
2020-04-01T05:53:31.3473786Z         Notes
2020-04-01T05:53:31.3474196Z         -----
2020-04-01T05:53:31.3474732Z         NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
2020-04-01T05:53:31.3476305Z         (IEEE 754). This means that Not a Number is not equivalent to infinity.
2020-04-01T05:53:31.3476608Z         Positive infinity is treated as a very large number and negative
2020-04-01T05:53:31.3478858Z         infinity is treated as a very small (i.e. negative) number.
2020-04-01T05:53:31.3479255Z     
2020-04-01T05:53:31.3479619Z         If the input has a integer type the function is equivalent to np.min.
2020-04-01T05:53:31.3479971Z     
2020-04-01T05:53:31.3480308Z         Examples
2020-04-01T05:53:31.3481117Z         --------
2020-04-01T05:53:31.3481526Z         >>> a = np.array([[1, 2], [3, np.nan]])
2020-04-01T05:53:31.3481875Z         >>> np.nanmin(a)
2020-04-01T05:53:31.3482212Z         1.0
2020-04-01T05:53:31.3482518Z         >>> np.nanmin(a, axis=0)
2020-04-01T05:53:31.3482838Z         array([1.,  2.])
2020-04-01T05:53:31.3483183Z         >>> np.nanmin(a, axis=1)
2020-04-01T05:53:31.3483506Z         array([1.,  3.])
2020-04-01T05:53:31.3483789Z     
2020-04-01T05:53:31.3484518Z         When positive infinity and negative infinity are present:
2020-04-01T05:53:31.3484869Z     
2020-04-01T05:53:31.3485207Z         >>> np.nanmin([1, 2, np.nan, np.inf])
2020-04-01T05:53:31.3485532Z         1.0
2020-04-01T05:53:31.3485855Z         >>> np.nanmin([1, 2, np.nan, np.NINF])
2020-04-01T05:53:31.3486387Z         -inf
2020-04-01T05:53:31.3486734Z     
2020-04-01T05:53:31.3487009Z         """
2020-04-01T05:53:31.3487318Z         kwargs = {}
2020-04-01T05:53:31.3487661Z         if keepdims is not np._NoValue:
2020-04-01T05:53:31.3488198Z             kwargs['keepdims'] = keepdims
2020-04-01T05:53:31.3488662Z         if type(a) is np.ndarray and a.dtype != np.object_:
2020-04-01T05:53:31.3489117Z             # Fast, but not safe for subclasses of ndarray, or object arrays,
2020-04-01T05:53:31.3489807Z             # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
2020-04-01T05:53:31.3490313Z             res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
2020-04-01T05:53:31.3490701Z >           if np.isnan(res).any():
2020-04-01T05:53:31.3491523Z E           TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 

It looks like support for nanmin/nanmax on datetime arrays is relatively new, numpy/numpy#14831, so we may need to find a way around that.

@zxdawn
Copy link
Author

zxdawn commented Apr 6, 2020

@spencerkclark Maybe converting the datetime into number? BTW, Why not forcing numpy >= 1.18.1?

Comment on lines +552 to +553
imin = index.get_loc(np.nanmin(new_x.values), method="nearest")
imax = index.get_loc(np.nanmax(new_x.values), method="nearest")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was a little uneasy about your suggestion to require a newer version of NumPy, but after reading about the history of this issue a little more, I think something along those lines may actually be the simplest approach (perhaps you were ahead of me on this).

It turns out that np.min and np.max behave like np.nanmin and np.nanmax for time-like types in versions of NumPy prior to 1.18.1 (see numpy/numpy#14841 (comment)), so we can continue to use those for time-like types and earlier versions of NumPy and still obtain the desired behavior. I think something like the following would work:

Suggested change
imin = index.get_loc(np.nanmin(new_x.values), method="nearest")
imax = index.get_loc(np.nanmax(new_x.values), method="nearest")
if LooseVersion(np.__version__) < LooseVersion("1.18.1") and new_x.dtype.kind in "mM":
# In older versions of NumPy min and max (incorrectly) ignore NaT values
# for time-like types -- that is they behave like nanmin and nanmax
# should -- but nanmin and nanmax raise an error.
imin = index.get_loc(np.min(new_x.values), method="nearest")
imax = index.get_loc(np.max(new_x.values), method="nearest")
else:
imin = index.get_loc(np.nanmin(new_x.values), method="nearest")
imax = index.get_loc(np.nanmax(new_x.values), method="nearest")

@zxdawn
Copy link
Author

zxdawn commented Apr 12, 2020

It's my first time to write test for xarray. I tried this method: pytest test_interp.py::test_nans -v, but all tests are skipped:

=================================================== test session starts ====================================================
platform linux -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.12.0 -- /yin_raid/xin/miniconda3/envs/xarray_dev/bin/python
cachedir: .pytest_cache
rootdir: /yin_raid/xin/github/xarray, inifile: setup.cfg
collected 2 items

test_interp.py::test_nans[True] SKIPPED                                                                              [ 50%]
test_interp.py::test_nans[False] SKIPPED                                                                             [100%]

==================================================== 2 skipped in 0.50s ====================================================

How to make the test actually run?

@dcherian
Copy link
Contributor

Do you have scipy installed?

@zxdawn
Copy link
Author

zxdawn commented Apr 12, 2020

@dcherian Oh, thanks! After scipy is installed, it works:

=============================================== test session starts ===============================================
platform linux -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.12.0 -- /yin_raid/xin/miniconda3/envs/xarray_dev/bin/python
cachedir: .pytest_cache
rootdir: /yin_raid/xin/github/xarray, inifile: setup.cfg
collected 2 items

test_interp.py::test_nans[True] SKIPPED                                                                     [ 50%]
test_interp.py::test_nans[False] PASSED                                                                     [100%]

================================================ warnings summary =================================================
xarray/tests/test_interp.py::test_nans[False]
xarray/tests/test_interp.py::test_nans[False]
  /yin_raid/xin/miniconda3/envs/xarray_dev/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
    return f(*args, **kwds)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
==================================== 1 passed, 1 skipped, 2 warnings in 0.88s =====================================

I will update the test and pull it soon.

@dcherian dcherian mentioned this pull request May 5, 2020
23 tasks
@mathause
Copy link
Collaborator

Thank you for the PR - this now fixed in #4233

@mathause mathause closed this Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants