Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidimensional reindex #1553

Open
fujiisoup opened this issue Sep 4, 2017 · 2 comments
Open

Multidimensional reindex #1553

fujiisoup opened this issue Sep 4, 2017 · 2 comments

Comments

@fujiisoup
Copy link
Member

From a discussion in #1473 comment

It would be convenient if we have multi-dimensional reindex method, where we consider dimensions and coordinates of indexers.
The proposed outline by @shoyer is

  • Given reindex arguments of the form dim=array where array is a 1D unlabeled array/list, convert them into DataArray(array, [(dim, array)]).
  • Do multi-dimensional indexing with broadcasting like sel, but fill in NaN for missing values (we could allow for customizing this with a fill_value argument).
  • Join coordinates like for sel, but coordinates from the indexers take precedence over coordinates from the object being indexed.
@fujiisoup
Copy link
Member Author

Suggestion about the coordinate dropping rule for reindex
(ported from shoyer's comment)

  • For reindex(), indexing coordinates take precedence in the result ([kwargs[k] for k in kwargs] for obj.reindex(**kwargs)). Conflicts with indexed coordinates on the indexed object are silently ignored.

Which we would use with normal rule for dimension/non-dimension coordinates:

  • Conflicts between dimension coordinates (except for precedence) result in an error.
  • Conflicts between non-dimension coordinates result in silently dropping the conflicting variable.

@batterseapower
Copy link

For the case of a simple vectorized reindex you can work around the lack of a multi-dimensional DataArray.reindex by falling back on isel as follows:

def reindex_vectorized(da, indexers, method=None, tolerance=None, dim=None, fill_value=None):
    # Reindex does not presently support vectorized lookups: https://github.com/pydata/xarray/issues/1553
    # Sel does (e.g. https://github.com/pydata/xarray/issues/4630) but can't handle missing keys
    
    if dim is None:
        dim = 'dim_0'

    if fill_value is None:
        fill_value = {'i': np.nan, 'f': np.nan}[da.dtype.kind]
    dtype = np.result_type(fill_value, da.dtype)
    
    if method is None:
        method = {}
    elif not isinstance(method, dict):
        method = {dim: method for dim in da.dims}
        
    if tolerance is None:
        tolerance = {}
    elif not isinstance(tolerance, dict):
        tolerance = {dim: tolerance for dim in da.dims}
    
    ixs = {}
    masks = []
    any_empty = False
    for index_dim, index in indexers.items():
        ix = da.indexes[index_dim].get_indexer(index, method=method.get(index_dim), tolerance=tolerance.get(index_dim))
        ixs[index_dim] = xr.DataArray(np.fmax(0, ix), dims=[dim])
        masks.append(ix >= 0)
        any_empty = any_empty or (len(da.indexes[index_dim]) == 0)
    
    mask = functools.reduce(lambda x, y: x & y, masks)
    
    if any_empty and len(mask):
        # Unfortunately can't just isel with `ixs` in this special case, because we'll go out of bounds accessing index 0
        new_coords = {
            name: coord
            for name, coord in da.coords.items()
            # XXX: to match the other case we should really include coords with name in ixs too, but it's fiddly
            if name not in ixs
        }
        new_dims = [name for name in da.dims if name not in ixs] + [dim]
        result = xr.DataArray(
            data=np.broadcast_to(
                fill_value,
                tuple(n for name, n in da.sizes.items() if name not in ixs) + (len(mask),)
            ),
            coords=new_coords, dims=new_dims,
            name=da.name, attrs=da.attrs
        )
    else:
        result = da[ixs]

        if not mask.all():
            result = result.astype(dtype, copy=False)
            result[{dim: ~mask}] = fill_value
    
    return result

Example:

sensor_data = xr.DataArray(np.arange(6).reshape((3, 2)), coords=[
    ('time', [0, 2, 3]),
    ('sensor', ['A', 'C']),
])

reindex_vectorized(sensor_data, {
    'sensor': ['A', 'A', 'A', 'B', 'C'],
    'time': [0, 1, 2, 0, 0],
}, method={'time': 'ffill'})
# [0, 0, 2, nan, 1]

reindex_vectorized(xr.DataArray(coords=[
    ('sensor', []),
    ('time', [0, 2])
]), {
    'sensor': ['A', 'A', 'A', 'B', 'C'],
    'time': [0, 1, 2, 0, 0],
}, method={'time': 'ffill'})
# [nan, nan, nan, nan, nan]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants