-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restored old xarray coord order inference behavior #2579
Conversation
Looks good to me. Happy to merge once the tests are green. |
db0640e
to
1f3823c
Compare
The behavior of Dataset.coords seems to have changed in xarray>=0.10.0 and earlier versions do not seem to be predictable. Have to think a bit more about what to do about that. There was a reason that I had made the change originally. |
Maybe have behavior conditional on the xarray version? Or is that too confusing? |
I think I was wrong about that, I've responded in the original issue and am going to close this PR for the time being. Unless someone convinces me otherwise I'm going to consider the new behavior more correct and leave it going forward. |
1f3823c
to
d7f6a69
Compare
@marcbernot convinced me this is the correct thing to do. So I would like to see this merged for 1.10.1 |
d7f6a69
to
bb5e3d4
Compare
Tests are green. Merging. |
@philippjfr This issue has just bitten me a second time. I know that this is quite an involved topic for me to wade into, but one suggestion by @jsignell is to use the order of This makes more intuitive sense to me so I'm just wondering why we aren't using that approach? Is it just for backwards compatibility, or something else? |
It's worth reading the linked issue, we had actually very briefly switched to dims ordering (by accident) in 1.10 and reverted the change in this PR. There are two main reasons for that, a) on a xr.Dataset the dims order can vary between the When loading a NetCDF dataset the coordinates are usually ordered something like # holoviews
hv.Dataset(xr_obj).to(hv.Image)
# hvplot
xr_obj.hvplot.image() and it will automatically deduce that you want a stack of images you can explore across height and/or time. The problem is that tiffs loaded using That all being said it's not quite clear to me yet what we should do, the xarrays data model does often give coordinates a semantic meaning but if that is not generally followed that becomes less useful. My first instinct would be to try to figure out why rasterio loads some tiffs with a non-convention conforming coordinate order before deciding that holoviews should conform to the dim ordering instead. |
That does seem reasonable.
Agreed. The important thing is that the coordinate ordering is meaningful in the common case and unfortunately loading tiff data with rasterio is one of those common cases. My only other comment is that if you are visualizing a single |
I think that how rasterio is loads the tiffs is definitely strange and worth looking into. But I think that the discussion of what holoviews should do is slightly different. There is no requirement that xarray objects even have coordinates. >>> data = np.random.rand(4, 3)
>>> da = xr.DataArray(data)
>>> da
<xarray.DataArray (dim_0: 4, dim_1: 3)>
array([[0.169981, 0.950831, 0.825206],
[0.242455, 0.056433, 0.346963],
[0.052712, 0.359673, 0.710705],
[0.139174, 0.244938, 0.284301]])
Dimensions without coordinates: dim_0, dim_1
>>> da.dims
('dim_0', 'dim_1')
>>> da.coords
Coordinates:
*empty* This works: >>> hv.Image(da.data) But this throws an error: >>> hv.Image(da)
....
ValueError: kdims: list length must be between 2 and 2 (inclusive) I think fundamentally we should be using dims but it is true that there is no global ordering for Datasets with dims. Each |
I'll make a PR. |
We will at least have to come up with a slightly more nuanced policy than simply always using the |
I would have thought that the current situation is that people have to end up specifying their kdims explicitly for it to behave in any reasonable way, and if they've done that then this change won't affect them. Is that true? |
In any case, maybe you could poll those users about this change? |
@philippjfr That is what I was suggesting above as one possible option. Of course, I would rather avoid special cases so if it is appropriate to always use dims, we should do that. @jsignell I didn't know that DataArray's didn't have coords, thanks for pointing that out. If dims must be there (but coords don't) then it would seem appropriate for me for holoviews to use dims in order to support the more general case. A PR would certainly help us think more clearly about this issue!
That is also what I was thinking/hoping! |
No, it's worth reading my summary above. In most cases
As I mention above the problem arises because for certain tiffs the xarray rasterio returns coordinates not following the convention which means you have to specify the key dimensions to orient the array correctly. In almost all other cases I have come across coordinates behave correctly and for multi-dimensional datasets in particular they result in preferable behavior which is why users complained when we switched to dims ordering. |
As for objects without dims I suspect we might currently have much more general issues handling those, but at least in the DataArray case we definitely should fix that. I'm not entirely sure what it would mean for there to be an xr.Dataset without coords or if that's even possible since the coordinates specify how the DataArrays contained in the Dataset line up. |
I guess my exposure is only to the tiff case which doesn't work well: I'll take your word that the default of using coords works well in most other cases. That said (and I may be misremembering!) doesn't the geoviews |
The response was pretty emphatic at the time, it's also a major backward incompatible change. I really don't think we can just switch the behavior entirely, I'd be much happier with a more narrow and targeted fix.
I implemented it as a separate method called |
Having looked at the xarray piece of holoviews, I think there is an easy fix for the no coords case. We can assign coordinates based on dim names when no coords are specified: data.assign_coords(**{k: range(v) for k, v in data.dims.items()}) this will add an int index for each dim same as what happens if you roundtrip an xarray object to pandas and back: >>> data.to_dataframe().to_xarray()
<xarray.Dataset>
Dimensions: (dim_0: 7, dim_1: 9)
Coordinates:
* dim_0 (dim_0) int64 0 1 2 3 4 5 6
* dim_1 (dim_1) int64 0 1 2 3 4 5 6 7 8
Data variables:
z (dim_0, dim_1) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... |
Great, thanks @jsignell, that makes sense to me, just make sure to make a shallow copy before assigning the coords (if assign_coords doesn't already do that). |
As for ordering of kdims based on dims, I don't think there is a good global way because the ordering of dims just isn't global. |
Right the original object isn't mutated |
Done for the tiff shown in the screenshot above... |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
In 1.10.0 the code that infers the coordinate order from an XArray Dataset changed, unintentionally changing the behavior. This PR restores the old behavior.