Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal refactor: create a generic function for applying ufuncs-like functions to xarray objects #770

Closed
shoyer opened this issue Feb 19, 2016 · 4 comments

Comments

@shoyer
Copy link
Member

shoyer commented Feb 19, 2016

It would be awesome to have a generic function for making functions that act like NumPy's generalized universal functions "xarray aware".

What would xarray.apply_ufunc(func, objs, join='inner', agg_dims=None, drop_dims=None, kwargs=None) do?

  1. If one or more of the provided objects are Dataset or GroupBy instances, dispatch to specialized loops that call the remainder of apply_ufunc repeatedly.
  2. align all objects along shared labels using the indicated join (for some operations, e.g., where, a left join is appropriate rather than an inner join).
  3. broadcast all objects against each other to expand dimensionality along all dimensions except (optionally) those listed in agg_dims/drop_dims. drop_dims should be moved to the end, for consistency with gufunc signatures.
  4. Transform agg_dims (if provided) into an axis argument using get_axis_num and insert it into kwargs.
  5. Apply func to the data argument of each array to calculate the result using the provided kwargs. The result is expected to have all the same dimensions in the provided arrays, except any listed in the dims and drop_dims arguments.
  6. merge all coordinate data together (i.e., with an n-ary version of the Coordinate.merge method) and add these to the result array.

If any of args are not xarray objects (e.g., they're NumPy or dask arrays), they should be skipped in operations that don't apply to them. xarray.Variable don't align or have coordinates, for example.

A concrete example of similar functionality in dask.array is atop. The most similar thing to this that we currently have in xarray are the _unary_op and _binary_op staticmethods (e.g., on DataArray), but these only handle one or two arguments, don't handle aggregated dimensions and most importantly, are difficult to apply to new operations.

Here are a few concrete examples of how this could work:

def average(array, weights, dim=None):
    # still needs a bit of work to make a NaN and dask.array safe version
    # version of np.average 
    return apply_ufunc(np.average, [array, weights], agg_dims=dim)

def where(cond, first, second=None):
    if second is None:
        # need to write where2, a function that looks at first.dtype
        # to infer the appropriate NA sentinel value
        return apply_ufunc(ops.where2, [cond, first])
    else:
        return apply_ufunc(ops.where, [cond, first, second])

def dot(self, other, dim=None):
    if dim is None:
        dim = set(self.dims) ^ set(other.dims)
    return apply_ufunc(ops.tensordot, [self, other], agg_dims=dim)
@shoyer shoyer mentioned this issue Feb 19, 2016
@shoyer
Copy link
Member Author

shoyer commented Aug 11, 2016

I've started working on this in master...shoyer:apply_ufunc

@max-sixty
Copy link
Collaborator

I had a go implementing .where with an other argument.

I've really struggled at working out how the layers work - between _func_slash_method_wrapper, _dask_or_eager_funcand a lot of where functions, I can't actually work out where the restriction for the additional arg is coming from. I changed _binary_op to accept args & kwargs, but I still get a TypeError: func() takes 2 positional arguments but 3 were given (and working out what func seems beyond me!).

Potentially this might be worth waiting for the more generic function before implementing this? Or should I persevere?

@shoyer
Copy link
Member Author

shoyer commented Aug 12, 2016

@MaximilianR Take a look at my branch above. I'm entirely rewriting the guts of the computation code. Once this is in, writing where (with apply_ufunc) should be quite straightforward.

@shoyer
Copy link
Member Author

shoyer commented Aug 12, 2016

But I totally agree -- the current code involves way too many layers of indirection and anonymous functions. _func_slash_method_wrapper is probably the worst hack :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants