Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat along dim with mix of scalar coordinate and array coordinates is not right #6434

Closed
dcherian opened this issue Apr 1, 2022 · 3 comments · Fixed by #6443
Closed

concat along dim with mix of scalar coordinate and array coordinates is not right #6434

dcherian opened this issue Apr 1, 2022 · 3 comments · Fixed by #6443
Labels

Comments

@dcherian
Copy link
Contributor

dcherian commented Apr 1, 2022

What happened?

Really hard to describe in words =)

concat = xr.concat([da.isel(time=0), da.isel(time=[1])], dim="time")
xr.align(concat, da, dim="time")

fails when concat and da should be identical. This is causing failures in cf-xarray:xarray-contrib/cf-xarray#319

cc @benbovy

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr

time = xr.DataArray(
    np.array(
        ["2013-01-01T00:00:00.000000000", "2013-01-01T06:00:00.000000000"],
        dtype="datetime64[ns]",
    ),
    dims="time",
    name="time",
)

da = time
concat = xr.concat([da.isel(time=0), da.isel(time=[1])], dim="time")
xr.align(da, concat, join="exact")  #  works

da = xr.DataArray(np.ones(time.shape), dims="time", coords={"time": time})
concat = xr.concat([da.isel(time=0), da.isel(time=[1])], dim="time")
xr.align(da, concat, join="exact")

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [27], in <module>
     17 da = xr.DataArray(np.ones(time.shape), dims="time", coords={"time": time})
     18 concat = xr.concat([da.isel(time=0), da.isel(time=[1])], dim="time")
---> 19 xr.align(da, concat, join="exact")

File ~/work/python/xarray/xarray/core/alignment.py:761, in align(join, copy, indexes, exclude, fill_value, *objects)
    566 """
    567 Given any number of Dataset and/or DataArray objects, returns new
    568 objects with aligned indexes and dimension sizes.
   (...)
    751 
    752 """
    753 aligner = Aligner(
    754     objects,
    755     join=join,
   (...)
    759     fill_value=fill_value,
    760 )
--> 761 aligner.align()
    762 return aligner.results

File ~/work/python/xarray/xarray/core/alignment.py:549, in Aligner.align(self)
    547 self.find_matching_unindexed_dims()
    548 self.assert_no_index_conflict()
--> 549 self.align_indexes()
    550 self.assert_unindexed_dim_sizes_equal()
    552 if self.join == "override":

File ~/work/python/xarray/xarray/core/alignment.py:395, in Aligner.align_indexes(self)
    393 if need_reindex:
    394     if self.join == "exact":
--> 395         raise ValueError(
    396             "cannot align objects with join='exact' where "
    397             "index/labels/sizes are not equal along "
    398             "these coordinates (dimensions): "
    399             + ", ".join(f"{name!r} {dims!r}" for name, dims in key[0])
    400         )
    401     joiner = self._get_index_joiner(index_cls)
    402     joined_index = joiner(matching_indexes)

ValueError: cannot align objects with join='exact' where index/labels/sizes are not equal along these coordinates (dimensions): 'time' ('time',)

Anything else we need to know?

No response

Environment

xarray main

@dcherian dcherian added the bug label Apr 1, 2022
@benbovy
Copy link
Member

benbovy commented Apr 2, 2022

The first example works because there's no index.

In the second example, a PandasIndex is created from the scalar value (wrapped in a sequence) so that concat works on the "time" dimension (required since the logic has moved to Index.concat). See #5692 (comment).

The problem is when creating a PandasIndex we call pd.Index, which doesn't seem to create the right kind of index given the value type:

array = da.isel(time=0).values
value = array.item()
seq = np.array([value], dtype=array.dtype)
pd.Index(seq, dtype=array.dtype)
# Float64Index([1.0], dtype='float64')

So in the example above you end-up with different index types, which xr.align doesn't like:

concat.indexes["time"]
# Index([1356998400000000000, 2013-01-01 06:00:00], dtype='object', name='time')

da.indexes["time"]
# DatetimeIndex(['2013-01-01 00:00:00', '2013-01-01 06:00:00'], dtype='datetime64[ns]', name='time', freq=None)

concat.indexes["time"].equals(da.indexes["time"])
# False

I'm not very satisfied with the current solution in concat but I'm not sure what we should do here:

  • Special case for datetime, and other value types?
  • Review the approach used to concatenate scalar coordinates (no-index) and indexed array coordinates?
  • Depreciate concatenating a mix of scalar coordinates and indexed coordinates?

@dcherian
Copy link
Contributor Author

dcherian commented Apr 3, 2022

which doesn't seem to create the right kind of index given the value type:

There's a typo in the first line, we need da.time, this does actually work

array = da.isel(time=0).time.values
value = array.item()
seq = np.array([value], dtype=array.dtype)
pd.Index(seq, dtype=array.dtype)
# DatetimeIndex(['2013-01-01'], dtype='datetime64[ns]', freq=None)

The issue is that the .item() converts datetime64 to int and so safe_cast_to_index creates a IntIndex because we don't pass dtype to the pd.Index constructor (so that's one possible fix):

xarray/xarray/core/utils.py

Lines 130 to 133 in 3ead17e

kwargs = {}
if hasattr(array, "dtype") and array.dtype.kind == "O":
kwargs["dtype"] = object
index = pd.Index(np.asarray(array), **kwargs)

Alternatively, we could loop over datasets and call expand_dims if dim is present and is a scalar. We already do something similar here:

# case where concat dimension is a coordinate or data_var but not a dimension
if (dim in coord_names or dim in data_names) and dim not in dim_names:
datasets = [ds.expand_dims(dim) for ds in datasets]

@benbovy
Copy link
Member

benbovy commented Apr 5, 2022

Alternatively, we could loop over datasets and call expand_dims if dim is present and is a scalar.

Ah yes 👍 . Not sure why this case didn't fill the conditions for calling expand_dims on the input datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants