Sparse DataArray indexing gives incorrect results #4019

bnaul · 2020-05-02T00:02:28Z

Tested on xarray 0.15.1, sparse 0.9.1 and xarray 0.15.2.dev50+g3820fb77, sparse-0.9.1+26.ga9a6de2:

# Random sparse sample data
idx = pd.MultiIndex.from_product([np.arange(100), np.arange(100)], names=['a', 'b'])
s = pd.Series(np.random.RandomState(0).random(len(idx)), index=idx).sample(n=500, random_state=0)

da_dense = xr.DataArray.from_series(s, sparse=False)
da_sparse = xr.DataArray.from_series(s, sparse=True)

key = 23
print("Total:", da_dense.sum().values, da_sparse.sum().values)
print(f"loc[key]:", da_dense.loc[key].sum().values, da_sparse.loc[key].sum().values)
print(f"[key]:", da_dense[key].sum().values, da_sparse[key].sum().values)
print(f".isel(key):", da_dense.isel({'a': key}).sum().values, da_sparse.isel({'a': key}).sum().values)
print(f".sel(key):", da_dense.sel({'a': key}).sum().values, da_sparse.sel({'a': key}).sum().values)

Output:

Total: 253.0721848728631 253.07218487286306
loc[key]: 3.5885153944770103 0.0
[key]: 3.5885153944770103 0.0
.isel(key): 3.5885153944770103 0.0
.sel(key): 3.5885153944770103 0.0

It does appear that the underlying sparse.COO has the correct values:

np.nansum(da_dense.data[23])
3.5885153944770103

da_sparse.data.data[da_sparse.data.coords[0] == 23].sum()
3.5885153944770103

Happy to try to delve in deeper but if anyone knows off the top of their head what the issue might be that would be very welcome 🙂

One other observation: the result isn't always 0, e.g.:

# key = 44
Total: 253.0721848728631 253.07218487286306
loc[key]: 2.868736626924726 1.1489982474345166
[key]: 2.868736626924726 1.1489982474345166
.isel(key): 2.868736626924726 1.1489982474345166
.sel(key): 2.868736626924726 1.1489982474345166

The text was updated successfully, but these errors were encountered:

bnaul · 2020-05-02T00:15:01Z

Aha! I think I see the issue: adding .sort_index() after s = ... gives the right results.

Would still say this is a bug since label-based indexing should work regardless of the input order.

dcherian added the bug label May 4, 2020

This was referenced May 4, 2020

Bug in the conversion of Pandas DataFrame into Xarray Dataset . #4027

Closed

0.16.0 release #4031

Closed

dcherian mentioned this issue May 22, 2020

Fix conversion of multiindexed pandas objects to sparse xarray objects #4088

Merged

4 tasks

dcherian closed this as completed in #4088 May 26, 2020

dcherian mentioned this issue Jun 19, 2020

Incorrect results with differently ordered coords pydata/sparse#360

Closed

khaeru mentioned this issue Jun 20, 2020

Use sparse xarray for reporting.Quantity iiasa/ixmp#191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse DataArray indexing gives incorrect results #4019

Sparse DataArray indexing gives incorrect results #4019

bnaul commented May 2, 2020 •

edited

Loading

bnaul commented May 2, 2020 •

edited

Loading

Sparse DataArray indexing gives incorrect results #4019

Sparse DataArray indexing gives incorrect results #4019

Comments

bnaul commented May 2, 2020 • edited Loading

bnaul commented May 2, 2020 • edited Loading

bnaul commented May 2, 2020 •

edited

Loading

bnaul commented May 2, 2020 •

edited

Loading