-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argmin / argmax behavior doesn't match documentation #1388
Comments
Agreed, the current implementation of |
Well, The next question is, what happens if you start supplying coordinate/dimension optional arguments to Does that seem reasonable? |
I agree that The main downside of returning a tuple or dict from
|
I'm working to fix this and I would like to make some design decisions;
Edit:
xr.DataArray(np.random.randn(4, 3, 2), dims=['x', 'y', 'z']).argmin_indexes(dims=['x']) Do we need special treatment for this (maybe in |
I am thinking again how (We discussed the name for this new method in #1469 but here I just use For example with a three dimensional array with
The above proposal for ii (and iii) is not quite clean, as if it is used as an argument of Any thoughts are welcome. |
I just came across the various argmax/idxmax (and related min) related issues recently in a project I have been working on. In addition to agreeing that docs should be updated when appropriate here are my two or three cents:
size = (2,2,2,2)
dims = list("wxyz")
data = np.random.rand(*size)
coords = {dim:["{0}_{1}".format(dim,s) for s in range(s)] for dim,s in zip(dims,size)}
da = xr.DataArray(data, dims=dims, coords=coords)
>>>da
<xarray.DataArray (w: 2, x: 2, y: 2, z: 2)>
array([[[[ 0.149945, 0.230338],
[ 0.626969, 0.299918]],
[[ 0.351764, 0.286436],
[ 0.130604, 0.982152]]],
[[[ 0.262667, 0.950426],
[ 0.76655 , 0.681631]],
[[ 0.635468, 0.735071],
[ 0.901116, 0.601303]]]])
Coordinates:
* w (w) <U3 'w_0' 'w_1'
* x (x) <U3 'x_0' 'x_1'
* y (y) <U3 'y_0' 'y_1'
* z (z) <U3 'z_0' 'z_1' I would like to get something like the following >>>argmax(da)
<xarray.DataArray '_argmax' (argmaxdim: 4)>
array(['w_0', 'x_1', 'y_1', 'z_1'],
dtype='<U3')
Coordinates:
* argmaxdim (argmaxdim) <U1 'w' 'x' 'y' 'z'
>>>argmax(da, dim=list("wy"))
<xarray.DataArray '_argmax' (x: 2, z: 2, argmaxdim: 2)>
array([[['w_1', 'y_1'],
['w_1', 'y_0']],
[['w_1', 'y_1'],
['w_0', 'y_1']]], dtype=object)
Coordinates:
* x (x) object 'x_0' 'x_1'
* z (z) object 'z_0' 'z_1'
* argmaxdim (argmaxdim) <U1 'w' 'y' where the order of the dims in the unreduced and argmax cases are in the right order as above. For reference, just in case that these examples aren't enough to generalize, a horribly inefficient implementation of above (assuming unique maximum indices): def _argmaxstackeddim(dastacked, ind):
keepdims = dastacked.indexes['keepdims'].names
values = dastacked.keepdims.values[ind]
coords = {keepdim:[val] for keepdim,val in zip(keepdims,values)}
result = dastacked.sel(keepdims=values)\
.pipe(argmax)\
.expand_dims(keepdims)\
.assign_coords(**coords)
return result
def argmax(da, dim=None):
daname = "" if da.name is None else da.name
name = daname+"_argmax"
if dim is None:
maxda = da.where(da == da.max(),drop=True)
dims = list(maxda.dims)
dimmaxvals = [maxda.coords[dim].values[0] for dim in dims]
result = xr.DataArray(dimmaxvals,
dims='argmaxdim',
coords={'argmaxdim':dims},
name = name)
return result
else:
if isinstance(dim,str):
dim = [dim]
keepdims = [d for d in da.dims if d not in dim]
dastacked = da.stack(keepdims = keepdims)
slices = [_argmaxstackeddim(dastacked,i) for i in range(len(dastacked.keepdims))]
return xr.merge(slices)[name] |
Sorry for my late response and thank you for the proposal. But aside from my previous proposal, I was thinking whether such aggregation methods (including Such specific rules may be confusing and bring additional complexity. If we adopt the above rule, I think the In [1]: import xarray as xr
...: da = xr.DataArray([[0, 3, 2], [2, 1, 4]], dims=['x', 'y'],
...: coords={'x': [1, 2], 'y': ['a', 'b', 'c']})
...:
In [4]: da.argmin(dim='x')
Out[4]:
<xarray.DataArray (y: 3)>
array([0, 1, 0])
Coordinates:
* y (y) <U1 'a' 'b' 'c'
In [3]: da.isel(x=da.argmin(dim='x'))
Out[3]:
<xarray.DataArray (y: 3)>
array([0, 1, 2])
Coordinates:
x (y) int64 1 2 1
* y (y) <U1 'a' 'b' 'c' I think your logic would be useful even we do not track the coordinate. I would appreciate any feedback. |
I think it would be fine to add a special case to preserve coordinates corresponding to min/max values with xarray's I agree that it does not make sense to preserve coordinates along aggregated dimensions for argmin/argmax, but we can preserve other coordinates. So I like @fujiisoup's example behavior above. I suppose we now have two candidate APIs for returning multiple indices from a method like argmin/argmax:
I think my favorite option is (2) with My concern with adding an additional dimension is that it is always a little surprising and error-prone when we invent new dimension names not supplied by the user (for example, this can lead to conflicting names). Also, consolidating indices will not work as well with Either way, I would like a separate dedicated method for returning multiple indexing arrays. It's convenient (and what users expect) for argmax to return a single array if taking the max only over one dimension. However, if we switch to add an |
@fujiisoup and @shoyer Really enlightening comments above. I think I am starting to get the dao of xarray a bit better :)
Agreed it would be nice to have a consistent and well reasoned rule for coordinate propagation in aggregation methods. I think a key point here, which gets brought up in your example is that it might make sense to have different subrules depending on the semantics of the operation. Functions like
Yeah, I felt a little dirty appending '_argmax'.
OK. I think I understand now why @fujiisoup proposed output a Dataset rather than an array. That's a natural syntax for getting the values from the indices.
+1 to dedicated adding more methods if needed, since I think even if it isn;t needed the associated docs will need to make sure users are aware of the analogous |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
I think this is fixed now thanks to @johnomotani |
The documentation reads
However, what happens is that the numpy argmin output (single index into the flattened array at which the maximum/minimum is found) is wrapped in a dummy DataArray (similar behavior for Datasets also):
I realize that maybe for compatibility reasons it is necessary to make Datasets/DataArrays do this, but it seems like the documented behavior would be nicer. At any rate, the documentation should match the behavior.
Specs:
python 2.7
xarray 0.9.1
numpy 1.11.3
The text was updated successfully, but these errors were encountered: