Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy of custom index does not align with original #7162

Closed
dcherian opened this issue Oct 14, 2022 · 7 comments
Closed

copy of custom index does not align with original #7162

dcherian opened this issue Oct 14, 2022 · 7 comments

Comments

@dcherian
Copy link
Contributor

dcherian commented Oct 14, 2022

What happened?

MY prototype CRSIndex is broken on the release version: https://github.com/dcherian/crsindex/blob/main/crsindex.ipynb under heading "BROKEN: Successfully align with a copy of itself"

The cell's code is :

copy = newds.copy(deep=True)
xr.align(copy, newds)

which should always work.

@headtr1ck is #7140 to blame?

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:43:44) [Clang 13.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2022.10.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.0
pydap: None
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.2
cfgrib: 0.9.10.2
iris: 3.3.1
bottleneck: 1.3.5
dask: 2022.9.2
distributed: 2022.9.2
matplotlib: 3.6.1
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.6.0
numpy_groupies: 0.9.19
setuptools: 65.5.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: None

@headtr1ck
Copy link
Collaborator

I don't think there is anything wrong with the deep copy implementation. Only before it was not deep copying the indexes and now it is.
Can you check if index1.equals(index2) works if they are created the same?

@headtr1ck
Copy link
Collaborator

Now that I think about it, Indexes.copy_indexes might also require some update that includes the memo argument.
But not sure if that will solve the issue here.

@benbovy
Copy link
Member

benbovy commented Oct 18, 2022

Indexes.copy_indexes might also require some update that includes the memo argument. But not sure if that will solve the issue here.

That's a possible cause. Alignment may fail early because .xindexes returns different mappings of coordinates vs. index objects. It's worth checking if after copying the dataset, copy.xindexes returns the same CRSIndex object for its "x", "y" and "spatial_ref" coordinates.

EDIT: checking copy.xindexes.group_by_index() is more convenient.

@benbovy
Copy link
Member

benbovy commented Oct 18, 2022

The refactored alignment logic could be improved (cf. #7002). The error raised in the method below is not very helpful.

def assert_no_index_conflict(self) -> None:
"""Check for uniqueness of both coordinate and dimension names across all sets
of matching indexes.
We need to make sure that all indexes used for re-indexing or alignment
are fully compatible and do not conflict each other.
Note: perhaps we could choose less restrictive constraints and instead
check for conflicts among the dimension (position) indexers returned by
`Index.reindex_like()` for each matching pair of object index / aligned
index?
(ref: https://github.com/pydata/xarray/issues/1603#issuecomment-442965602)
"""
matching_keys = set(self.all_indexes) | set(self.indexes)
coord_count: dict[Hashable, int] = defaultdict(int)
dim_count: dict[Hashable, int] = defaultdict(int)
for coord_names_dims, _ in matching_keys:
dims_set: set[Hashable] = set()
for name, dims in coord_names_dims:
coord_count[name] += 1
dims_set.update(dims)
for dim in dims_set:
dim_count[dim] += 1
for count, msg in [(coord_count, "coordinates"), (dim_count, "dimensions")]:
dup = {k: v for k, v in count.items() if v > 1}
if dup:
items_msg = ", ".join(
f"{k!r} ({v} conflicting indexes)" for k, v in dup.items()
)
raise ValueError(
"cannot re-index or align objects with conflicting indexes found for "
f"the following {msg}: {items_msg}\n"
"Conflicting indexes may occur when\n"
"- they relate to different sets of coordinate and/or dimension names\n"
"- they don't have the same type\n"
"- they may be used to reindex data along common dimensions"
)

@headtr1ck
Copy link
Collaborator

Could you create a MVCE without a custom index that I can use for debugging?

@scottyhq
Copy link
Contributor

scottyhq commented Mar 24, 2023

Reviving this exploration with xarray : 2023.3.0 and I'm no longer seeing the traceback linked above, so I think this can be closed

Original traceback from xr.align(copy, newds), which now succeeds:

`File ~/mambaforge/envs/xarray-release/lib/python3.10/site-packages/xarray/core/indexes.py:490, in PandasIndex.equals(self, other)
    488 if not isinstance(other, PandasIndex):
    489     return False
--> 490 return self.index.equals(other.index) and self.dim == other.dim

AttributeError: 'PandasIndex' object has no attribute 'index'

@dcherian
Copy link
Contributor Author

Thanks @scottyhq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants