-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataArray.interp() : poor performance #2223
Comments
@gwin-zegal , thank you for using our new feature and reporting the issue. I will look inside later, whether problem is on our code or upstream (scipy.interpolate). A possible workaround for your code is to change arr.interp({'tension': new_tension, 'resistance': new_resistance}) to arr.interp({tension': new_tension}).interp('resistance': new_resistance}) but it does not solve all the problems. |
Thanks for your comment had a deeper look at Updated my comparison with the same behavior with scipy and sdf in a jupyter notebook on mybinder It seems everything's well with xarray. Extrapolation (linear first) would be a good feature too, I put an example at the end of the notebook about sdf interpolation/extrapolation possibilites (work for nd-arrays of dim 32 as with numpy) |
Thanks for your deeper analysis.
Happy to hear that.
Our new_tension = xr.DataArray(new_tension, dims='new_dim')
new_resistance = xr.DataArray(new_resistance, dims='new_dim')
arr.interp(tension=new_tension, resistance=new_resistance) which gives a 1d array with the new dimension |
I want to keep this issue open, as the performance can be increased for such a case. In the above example, arr.interp(tension=new_tension, resistance=new_resistance) and arr.interp(tension=new_tension).interp(resistance=new_resistance) gives the same result (for 'linear' and 'nearest' methods), but the latter runs much faster. |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
Code Sample
Hello,
I performed a quick comparison of the newly introduced method ìnterp()` with an adapter (draft) to the sdf (scientific data format) library: https://gist.github.com/gwin-zegal/b955c3ef63f5ad51eec6329dd2e620be#file-array_sdf_interp-py
Code for a micro comparison (2D array) in python (include the above gist first):
Problem description
The time spent for
array.interp()
is growing exponentially... over two 2min (xarray internal interp) on my old machine compared to 9ms (C-SDF wrapper) for 10_000 interpolations.The C-SDF code is slow (a copy of the array is performed and algorithms not so optimized), but xarray implementation is not usable in daily life on my machine!
{'xarray': [<TimeitResult : 6.46 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 6.46 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 6.99 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 8.91 ms ± 52 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 19.8 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 112 ms ± 638 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)>,
<TimeitResult : 584 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>,
<TimeitResult : 2.63 s ± 43.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>,
<TimeitResult : 15.5 s ± 147 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>,
<TimeitResult : 2min 23s ± 18.7 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>],
'c_sdf': [<TimeitResult : 1.08 ms ± 7.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.09 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.1 ms ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.13 ms ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.19 ms ± 7.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.32 ms ± 9.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 1.59 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)>,
<TimeitResult : 2.19 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 3.51 ms ± 34.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>,
<TimeitResult : 9.27 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>]}
Performance issue on my machine or is it confirmed by others?
Output of
xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
xarray: 0.10.7
pandas: 0.23.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 39.2.0
pip: 10.0.1
conda: 4.5.4
pytest: 3.4.1
IPython: 6.4.0
sphinx: 1.7.5
The text was updated successfully, but these errors were encountered: