Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArrayResample.interpolate coordinates out of bound. #2197

Closed
aurghs opened this issue May 30, 2018 · 2 comments · Fixed by #2640
Closed

DataArrayResample.interpolate coordinates out of bound. #2197

aurghs opened this issue May 30, 2018 · 2 comments · Fixed by #2640
Labels

Comments

@aurghs
Copy link
Collaborator

aurghs commented May 30, 2018

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
import xarray as xr

time = np.arange('2007-02-01', '2007-03-02', dtype='datetime64').astype('datetime64[ns]')
arr = xr.DataArray(
    np.arange(time.size), coords=[time,], dims=('time',), name='data'
)
arr.resample(time='M').interpolate('linear')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-ff65c4d138e7> in <module>()
      7     np.arange(time.size), coords=[time,], dims=('time',), name='data'
      8 )
----> 9 arr.resample(time='M').interpolate('linear')

~/devel/c3s-cns/venv_op/lib/python3.6/site-packages/xarray/core/resample.py in interpolate(self, kind)
    108 
    109         """
--> 110         return self._interpolate(kind=kind)
    111 
    112     def _interpolate(self, kind='linear'):

~/devel/c3s-cns/venv_op/lib/python3.6/site-packages/xarray/core/resample.py in _interpolate(self, kind)
    218             elif self._dim not in v.dims:
    219                 coords[k] = v
--> 220         return DataArray(f(new_x), coords, dims, name=dummy.name,
    221                          attrs=dummy.attrs)
    222 

~/devel/c3s-cns/venv_op/lib/python3.6/site-packages/scipy/interpolate/polyint.py in __call__(self, x)
     77         """
     78         x, x_shape = self._prepare_x(x)
---> 79         y = self._evaluate(x)
     80         return self._finish_y(y, x_shape)
     81 

~/devel/c3s-cns/venv_op/lib/python3.6/site-packages/scipy/interpolate/interpolate.py in _evaluate(self, x_new)
    632         y_new = self._call(self, x_new)
    633         if not self._extrapolate:
--> 634             below_bounds, above_bounds = self._check_bounds(x_new)
    635             if len(y_new) > 0:
    636                 # Note fill_value must be broadcast up to the proper size

~/devel/c3s-cns/venv_op/lib/python3.6/site-packages/scipy/interpolate/interpolate.py in _check_bounds(self, x_new)
    664                              "range.")
    665         if self.bounds_error and above_bounds.any():
--> 666             raise ValueError("A value in x_new is above the interpolation "
    667                              "range.")
    668 

ValueError: A value in x_new is above the interpolation range.

Problem description

It raise an error if I try to interpolate. If time range is exactly a month, then it works:

time = np.arange('2007-02-01', '2007-03-01', dtype='datetime64').astype('datetime64[ns]')
arr = xr.DataArray(
    np.arange(time.size), coords=[time,], dims=('time',), name='data'
)
arr.resample(time='M').interpolate('linear')

<xarray.DataArray 'data' (time: 1)>
array([27.])
Coordinates:
  * time     (time) datetime64[ns] 2007-02-28

The problem for the interpolation seems to be that the resampler contains indices out bound ('2007-03-31'). It is ok for the aggregations, but it doesn't work with the interpolation.

resampler = arr.resample(time='M') 

resampler._full_index
DatetimeIndex(['2007-02-28', '2007-03-31'], dtype='datetime64[ns]', name='time', freq='M')

Expected Output

<xarray.DataArray 'data' (time: 1)>
array([27.])
Coordinates:
  * time     (time) datetime64[ns] 2007-02-28

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-43-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

xarray: 0.10.3
pandas: 0.22.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.17.4
distributed: None
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: None
setuptools: 39.2.0
pip: 10.0.1
conda: None
pytest: 3.5.1
IPython: 6.4.0
sphinx: 1.7.4

@shoyer
Copy link
Member

shoyer commented Jun 1, 2018

This is a good point -- the current code only really makes sense when interpolating to a higher frequency, e.g., daily -> hourly.

I do think going to higher frequencies is the main use-case for interpolation, but I don't think it needs to raise errors in this case. At the least, it would be OK to fill in the invalid values with NaN.

I think the fix might be as simple as setting bounds_error=False in these two places in xarray/core/resmample.py:

f = interp1d(x, y, kind=kind, axis=axis, bounds_error=True,

axis=axis, bounds_error=True,

Any interest in putting together a pull request?

@shoyer shoyer added the bug label Jun 1, 2018
@aurghs
Copy link
Collaborator Author

aurghs commented Jun 1, 2018

Also with oversampling we have the same problem (2007-02-02 02:00:00 is out of bound):

import numpy as np
import pandas as pd
import xarray as xr

time = np.arange('2007-01-01 00:00:00', '2007-02-02 00:00:00', dtype='datetime64[ns]')
arr = xr.DataArray(
    np.arange(time.size), coords=[time,], dims=('time',), name='data'
)
resampler = arr.resample(time='3h', base=2, label='right')

resampler
DatetimeIndex(['2007-01-01 02:00:00', '2007-01-01 05:00:00',
               '2007-01-01 08:00:00', '2007-01-01 11:00:00',
               '2007-01-01 14:00:00', '2007-01-01 17:00:00',
               '2007-01-01 20:00:00', '2007-01-01 23:00:00',
               '2007-01-02 02:00:00', '2007-01-02 05:00:00',
               ...
               '2007-01-31 23:00:00', '2007-02-01 02:00:00',
               '2007-02-01 05:00:00', '2007-02-01 08:00:00',
               '2007-02-01 11:00:00', '2007-02-01 14:00:00',
               '2007-02-01 17:00:00', '2007-02-01 20:00:00',
               '2007-02-01 23:00:00', '2007-02-02 02:00:00'],
              dtype='datetime64[ns]', name='time', length=257, freq='3H')

The fix is really very easy, I can try to make pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants