Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping of unaligned Data at assignment to Dataset #4507

Open
DerWeh opened this issue Oct 13, 2020 · 2 comments
Open

Dropping of unaligned Data at assignment to Dataset #4507

DerWeh opened this issue Oct 13, 2020 · 2 comments

Comments

@DerWeh
Copy link

DerWeh commented Oct 13, 2020

What happened:
I recently ran into the trouble as I assigned data generate by an external program to a dataset, and suddenly the dataset contained only NaN, see the example below. The issue was, that the program rounded numbers to 10 digits, so the coordinates didn't match anymore. xarray silently ignores this.

What you expected to happen:
I would have expected an error or at least a warning, when the coordinates don't match.
The current behavior can lead to bugs which are very hard to trace.

Minimal Complete Verifiable Example:

import numpy as np
import xarray as xr

x = np.linspace(0, 1)
dataset = xr.Dataset(coords={'x': x})
data = xr.DataArray(np.random.random(50), dims=['x'], coords={'x': np.around(x, decimals=10)})

dataset['data'] = data
print(dataset.data)
print(dataset.coords['x'])
print(data.coords['x'])

Output:

# print(dataset.data)
<xarray.DataArray 'data' (x: 50)>
array([0.20134419,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan, 0.98357925])
Coordinates:
  * x        (x) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0
# print(dataset.coords['x'])
<xarray.DataArray 'x' (x: 50)>
array([0.      , 0.020408, 0.040816, 0.061224, 0.081633, 0.102041, 0.122449,
       0.142857, 0.163265, 0.183673, 0.204082, 0.22449 , 0.244898, 0.265306,
       0.285714, 0.306122, 0.326531, 0.346939, 0.367347, 0.387755, 0.408163,
       0.428571, 0.44898 , 0.469388, 0.489796, 0.510204, 0.530612, 0.55102 ,
       0.571429, 0.591837, 0.612245, 0.632653, 0.653061, 0.673469, 0.693878,
       0.714286, 0.734694, 0.755102, 0.77551 , 0.795918, 0.816327, 0.836735,
       0.857143, 0.877551, 0.897959, 0.918367, 0.938776, 0.959184, 0.979592,
       1.      ])
Coordinates:
  * x        (x) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0
# print(data.coords['x'])
<xarray.DataArray 'x' (x: 50)>
array([0.      , 0.020408, 0.040816, 0.061224, 0.081633, 0.102041, 0.122449,
       0.142857, 0.163265, 0.183673, 0.204082, 0.22449 , 0.244898, 0.265306,
       0.285714, 0.306122, 0.326531, 0.346939, 0.367347, 0.387755, 0.408163,
       0.428571, 0.44898 , 0.469388, 0.489796, 0.510204, 0.530612, 0.55102 ,
       0.571429, 0.591837, 0.612245, 0.632653, 0.653061, 0.673469, 0.693878,
       0.714286, 0.734694, 0.755102, 0.77551 , 0.795918, 0.816327, 0.836735,
       0.857143, 0.877551, 0.897959, 0.918367, 0.938776, 0.959184, 0.979592,
       1.      ])
Coordinates:
  * x        (x) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0

Anything else we need to know?:

Environment:

Output of xr.show_versions()
$ py -c "import xarray as xr; xr.show_versions()"
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-118-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.1

xarray: 0.16.1
pandas: 1.0.5
numpy: 1.18.5
scipy: 1.5.0
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.8.1
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.1.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.13.0
distributed: 2.13.0
matplotlib: 3.2.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 46.1.3
pip: 19.3.1
conda: 4.8.5
pytest: 5.1.2
IPython: 7.18.1
sphinx: 3.0.2
@keewis
Copy link
Collaborator

keewis commented Oct 13, 2020

this might be a duplicate of #2217.

If you're interested in this you might want to have a look at #4489 (and #4467?) which introduces a tolerance parameter to align.

@DerWeh
Copy link
Author

DerWeh commented Oct 13, 2020

I agree, that the given example problem is related to a tolerance.

In principle, I see the problem in the current practice of just dropping data that doesn't align. If I perform an assignment =, I do not expect to lose any data.

Another example would be assigning:

dataset['data2'] = xr.DataArray(np.random.random(50), dims=['x'], coords={'x': np.linspace(2, 12)})

This line of code would effectively do nothing, I generate data and upon assignment it is dropped.

But this might be a bit of a physiological question, what the governing design principle is. Personally I think, an assignment should only be possible if the assigned coordinates are a subset of the dataset's coordinates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants