Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: rename_vars to dimension coordinate does not create an index #6229

Closed
headtr1ck opened this issue Feb 1, 2022 · 6 comments · Fixed by #6999
Closed

[Bug]: rename_vars to dimension coordinate does not create an index #6229

headtr1ck opened this issue Feb 1, 2022 · 6 comments · Fixed by #6999
Labels

Comments

@headtr1ck
Copy link
Collaborator

headtr1ck commented Feb 1, 2022

What happened?

We used Data{set,Array}.rename{_vars}({coord: dim_coord}) to make a coordinate a dimension coordinate (instead of set_index).
This results in the coordinate correctly being displayed as a dimension coordinate (with the *) but it does not create an index, such that further operations like sel fail with a strange KeyError.

What did you expect to happen?

I expect one of two things to be true:

  1. rename{_vars} does not allow setting dimension coordinates (raises Error and tells you to use set_index)
  2. rename{_vars} checks for this occasion and sets the index correctly

Minimal Complete Verifiable Example

import xarray as xr

data = xr.DataArray([5, 6, 7], coords={"c": ("x", [1, 2, 3])}, dims="x")
# <xarray.DataArray (x: 3)>
# array([5, 6, 7])
# Coordinates:
#     c        (x) int64 1 2 3
# Dimensions without coordinates: x

data_renamed = data.rename({"c": "x"})
# <xarray.DataArray (x: 3)>
# array([5, 6, 7])
# Coordinates:
#   * x        (x) int64 1 2 3

data_renamed.indexes
# Empty
data_renamed.sel(x=2)
# KeyError: 'no index found for coordinate x'

# if we use set_index it works
data_indexed = data.set_index({"x": "c"})
# looks the same as data_renamed!
# <xarray.DataArray (x: 3)>
# array([1, 2, 3])
# Coordinates:
#   * x        (x) int64 1 2 3

data_indexed.indexes
# x: Int64Index([1, 2, 3], dtype='int64', name='x')

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.1 (default, Jan 13 2021, 15:21:08)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.49.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.20.2
pandas: 1.3.5
numpy: 1.21.5
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 49.2.1
pip: 22.0.2
conda: None
pytest: 6.2.5
IPython: 8.0.0
sphinx: None

@headtr1ck headtr1ck added bug needs triage Issue that has not been reviewed by xarray team member labels Feb 1, 2022
@benbovy
Copy link
Member

benbovy commented Feb 15, 2022

This has been discussed in #4825.

A third option for rename{_vars} would be to rename the coordinate and its index (if any), regardless of whether the old and new names correspond to existing dimensions. We plan to drop the concept of a "dimension coordinate" with an implicit index in favor of indexes explicitly part of Xarray's data model (see https://github.com/pydata/xarray/projects/1), so that it will be possible to set indexes for non-dimension coordinates and/or set dimension coordinates without indexes.

Re your example, in #5692 data.rename({"c": "x"}) does not implicitly create anymore an indexed coordinate (no *):

data_renamed
# <xarray.DataArray (x: 3)>
# array([5, 6, 7])
# Coordinates:
#     x        (x) int64 1 2 3

Instead, it should be possible to directly set an index for the c coordinate without the need to rename it, e.g.,

# API has still to be defined
data_indexed = data.set_index("c", index_cls=xr.PandasIndex)

data_indexed.sel(c=[1, 2])
# <xarray.DataArray (x: 2)>
# array([5, 6])
# Coordinates:
#   * c       (x) int64 1 2

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Mar 16, 2022
@dcherian
Copy link
Contributor

dcherian commented Apr 7, 2022

data.rename({"c": "x"}) does not implicitly create anymore an indexed coordinate

I have code that relied on automatic index creation through rename and some downstream code broke.

I think we need to address this through a warning or error so that users can be alerted that behaviour has changed.

@benbovy
Copy link
Member

benbovy commented Sep 7, 2022

I think we need to address this through a warning or error so that users can be alerted that behaviour has changed.

This is done in #6999. Would it close this issue?

@headtr1ck
Copy link
Collaborator Author

A warning is fine by me.
I did not test it so just asking: how will the renamed coord now be displayed? Does it still have the asterix(*)?

@benbovy
Copy link
Member

benbovy commented Sep 7, 2022

Does it still have the asterix(*)?

Not anymore with v2022.6.0.

@headtr1ck
Copy link
Collaborator Author

headtr1ck commented Sep 7, 2022

Then feel free to close this issue with your PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants