-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible indexes: how to handle possible dimension vs. coordinate name conflicts? #5738
Comments
My initial thoughts was to opt for D because it is easier to maintain (less special cases, less complexity, internally we could get rid of I was thinking about this kind of behavior (even though reverting previous changes is not ideal): ds = xr.Dataset(coords={'dim0': ['a', 'b'], 'dim1': [0, 1]})
ds = ds.stack(dim_stacked=['dim0', 'dim1'])
# This works: dim0 is not a dimension in b
ds['c'] = (('dim0',), [10, 11, 12, 13, 14, 15])
# raise a nice error message here: conflicting sizes for dimension 'dim0'
ds.unstack(dim_stacked)
# raise a nice error message here: conflicting sizes for dimension 'dim0'
ds.sel(dim1=0) However, this may be confusing in the case of integer-based vs. label based indexing: # label-based selection along `dim_stacked` dimension (length=4)
ds.sel(dim0='a')
# integer-based selection along `dim0` dimension (length=6)
ds.isel(dim0=0) So a more general rule like option C is not that silly after all? If a dimension name matches the name of a coordinate:
|
In the long term, I think we want to eliminate any requirements in Xarray's data model about what variables names are OK. In particular:
With this second change in particular, it should not be a problem to have a multi-index level with the same name as a dimension. It is true that this change will introduce an inconsistency between appropriate keys for use with |
Both 1 and 2 are now supported with v2023.8.0, so I think we can close this. |
Another thing that I've noticed while working on #5692.
Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with
sel
orunstack
). See #2299.I'm wondering how we should handle this in the context of flexible / custom indexes:
A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in
sel
orstack
?B. Introduce some tag in
xarray.Index
so that we can identify a multi-coordinate index that behaves like a hierarchical index (i.e., levels may be dropped into a single index/coordinate with dimension renaming)C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly?
D. Eventually revert #2353 and let users taking care of potential conflicts.
The text was updated successfully, but these errors were encountered: