Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge fails when sparse Dataset has overlapping dimension values #3445

Open
k-a-mendoza opened this issue Oct 24, 2019 · 3 comments
Open

Merge fails when sparse Dataset has overlapping dimension values #3445

k-a-mendoza opened this issue Oct 24, 2019 · 3 comments
Labels
topic-combine combine/concat/merge

Comments

@k-a-mendoza
Copy link

Sparse numpy arrays used in a merge operation seem to fail under certain coordinate settings. for example, this works perfectly:

import xarray as xr
import numpy as np

data_array1 = xr.DataArray(data,name='default',
                           dims=['source','receiver','time'],
                          coords={'source':['X.1'],
                                  'receiver':['X.2'],
                                  'time':time}).to_dataset()
data_array2 = xr.DataArray(data,name='default',
                           dims=['source','receiver','time'],
                          coords={'source':['X.2'],
                                  'receiver':['X.1'],
                                  'time':time}).to_dataset()

dataset1 = xr.merge([data_array1,data_array2])

But this raises an IndexError: Only indices with at most one iterable index are supported. from the sparse package:

import xarray as xr
import numpy as np
import sparse

data = sparse.COO.from_numpy(np.random.uniform(-1,1,(1,1,100)))
time = np.linspace(0,1,num=100)

data_array1 = xr.DataArray(data,name='default',
                           dims=['source','receiver','time'],
                          coords={'source':['X.1'],
                                  'receiver':['X.2'],
                                  'time':time}).to_dataset()
data_array2 = xr.DataArray(data,name='default',
                           dims=['source','receiver','time'],
                          coords={'source':['X.2'],
                                  'receiver':['X.1'],
                                  'time':time}).to_dataset()

dataset1 = xr.merge([data_array1,data_array2])

I have noticed this occurs when the merger would seem to add dimensions filled with nan values.

@friedrichknuth
Copy link

Note that dataset1 = xr.concat([data_array1,data_array2],dim='source') or dim='receiver' seem to work, however, concat also fails if time is specified as the dimension.

@friedrichknuth
Copy link

@El-minadero from the sparse API page I'm seeing two methods for combining data:

import sparse
import numpy as np

A = sparse.COO.from_numpy(np.array([[1, 2], [3, 4]]))
B = sparse.COO.from_numpy(np.array([[5, 9], [6, 8]]))
sparse.stack([A,B]).todense()

Out[1]:
array([[[1, 2],
        [3, 4]],
       [[5, 9],
        [6, 8]]])

sparse.concatenate([A,B]).todense()

Out[2]:
array([[1, 2],
       [3, 4],
       [5, 9],
       [6, 8]])

Since this is an issue with sparse and merging data doesn't seem to be supported at this time, you might consider closing this issue out here and raising it over at sparse.

@shoyer
Copy link
Member

shoyer commented Nov 8, 2019

The missing operation here in sparse is indexing like x[y, z] where y and z are both arrays.

For reference, here's the traceback:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-1-6547fa3d1500> in <module>()
     17                                   'time':time}).to_dataset()
     18 
---> 19 dataset1 = xr.merge([data_array1,data_array2])

9 frames
/usr/local/lib/python3.6/dist-packages/xarray/core/merge.py in merge(objects, compat, join, fill_value)
    780         dict_like_objects.append(obj)
    781 
--> 782     merge_result = merge_core(dict_like_objects, compat, join, fill_value=fill_value)
    783     merged = Dataset._construct_direct(**merge_result._asdict())
    784     return merged

/usr/local/lib/python3.6/dist-packages/xarray/core/merge.py in merge_core(objects, compat, join, priority_arg, explicit_coords, indexes, fill_value)
    537     coerced = coerce_pandas_values(objects)
    538     aligned = deep_align(
--> 539         coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
    540     )
    541     collected = collect_variables_and_indexes(aligned)

/usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value)
    403         indexes=indexes,
    404         exclude=exclude,
--> 405         fill_value=fill_value
    406     )
    407 

/usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects)
    331             new_obj = obj.copy(deep=copy)
    332         else:
--> 333             new_obj = obj.reindex(copy=copy, fill_value=fill_value, **valid_indexers)
    334         new_obj.encoding = obj.encoding
    335         result.append(new_obj)

/usr/local/lib/python3.6/dist-packages/xarray/core/dataset.py in reindex(self, indexers, method, tolerance, copy, fill_value, **indexers_kwargs)
   2430             tolerance,
   2431             copy=copy,
-> 2432             fill_value=fill_value,
   2433         )
   2434         coord_names = set(self._coord_names)

/usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in reindex_variables(variables, sizes, indexes, indexers, method, tolerance, copy, fill_value)
    581 
    582             if needs_masking:
--> 583                 new_var = var._getitem_with_mask(key, fill_value=fill_value)
    584             elif all(is_full_slice(k) for k in key):
    585                 # no reindexing necessary

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in _getitem_with_mask(self, key, fill_value)
    724                 actual_indexer = indexer
    725 
--> 726             data = as_indexable(self._data)[actual_indexer]
    727             mask = indexing.create_mask(indexer, self.shape, data)
    728             data = duck_array_ops.where(mask, fill_value, data)

/usr/local/lib/python3.6/dist-packages/xarray/core/indexing.py in __getitem__(self, key)
   1260     def __getitem__(self, key):
   1261         array, key = self._indexing_array_and_key(key)
-> 1262         return array[key]
   1263 
   1264     def __setitem__(self, key, value):

/usr/local/lib/python3.6/dist-packages/sparse/_coo/indexing.py in getitem(x, index)
     66 
     67     # Get the mask
---> 68     mask, adv_idx = _mask(x.coords, index, x.shape)
     69 
     70     # Get the length of the mask

/usr/local/lib/python3.6/dist-packages/sparse/_coo/indexing.py in _mask(coords, indices, shape)
    129     if len(adv_idx) != 0:
    130         if len(adv_idx) != 1:
--> 131             raise IndexError('Only indices with at most one iterable index are supported.')
    132 
    133         adv_idx = adv_idx[0]

IndexError: Only indices with at most one iterable index are supported.

@dcherian dcherian added the topic-combine combine/concat/merge label Jul 8, 2021
brendan-m-murphy added a commit to ACRG-Bristol/acrg that referenced this issue Jun 26, 2024
If the footprint and flux used in the inversion had differing
lat/lon coordinates, then the basis functions will be aligned to
the footprint (and "xsensitivity"), while the flux that is saved
will not be aligned to the footprint.

This causes a problem in some cases when trying to compute country
totals.

NOTE: we're using "override" because `inv_out.basis` is a sparse matrix
and currently you can't forward fill in more than one dimension when
using sparse matrices in xarray: pydata/xarray#3445
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-combine combine/concat/merge
Projects
None yet
Development

No branches or pull requests

4 participants