Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing name in index of series created from DataArray.to_series() after reindex #5018

Closed
yhlam opened this issue Mar 10, 2021 · 3 comments
Closed

Comments

@yhlam
Copy link

yhlam commented Mar 10, 2021

What happened:

If a DataArray is reindex() with a list, the Series created from DataArray.to_series() has an index with no name in the reindexed dimension. It was working for 2+ dimension array in 0.16.2 (i.e. the Series has MultiIndex instead of Index), but 0.17.0 doesn't work. For 1 dimension array, both 0.16.2 and 0.17.0 are not working as expected.

What you expected to happen:

I expect the Index.names of Series created from DataArray.to_series() contains the name of the dimensions of the DataArray, no matter reindex() was called on the DataArray before or not.

Minimal Complete Verifiable Example:

import numpy as np
import pytest
import xarray as xr


def test_multiindex():
    array = xr.DataArray(
        np.arange(12).reshape((4, 3)),
        coords=[
            ("foo", ["a", "b", "c", "d"]),
            ("bar", ["x", "y", "z"]),
        ]
    )
    assert array.to_series().index.names == ["foo", "bar"]


def test_multiindex_reindex():
    array = xr.DataArray(
        np.arange(12).reshape((4, 3)),
        coords=[
            ("foo", ["a", "b", "c", "d"]),
            ("bar", ["x", "y", "z"]),
        ]
    )
    series = array.reindex(bar=["x", "y"]).to_series()
    assert series.index.names == ["foo", "bar"]


def test_index_str():
    array = xr.DataArray(np.arange(4), coords=[("foo", ["a", "b", "c", "d"])])
    assert array.to_series().index.names == ["foo"]


def test_index_str_reindex():
    array = xr.DataArray(np.arange(4), coords=[("foo", ["a", "b", "c", "d"])])
    series = array.reindex(foo=["a", "b", "c"]).to_series()
    assert series.index.names == ["foo"]


def test_index_int():
    array = xr.DataArray(np.arange(4), coords=[("foo", [1, 2, 3, 4])])
    assert array.to_series().index.names == ["foo"]


def test_index_int_reindex():
    array = xr.DataArray(np.arange(4), coords=[("foo", [1, 2, 3, 4])])
    series = array.reindex(foo=[1, 2, 3]).to_series()
    assert series.index.names == ["foo"]

Version 0.16.2:

It works as expected for MultiIndex. However, the index name is missing after reindex for 1 dimension array.

================================================= test session starts =================================================
platform win32 -- Python 3.8.6, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
plugins: cov-2.11.1
collected 6 items

tests\test_to_series_index.py ...F.F                                                                             [100%]

====================================================== FAILURES =======================================================
_______________________________________________ test_index_str_reindex ________________________________________________

    def test_index_str_reindex():
        array = xr.DataArray(np.arange(4), coords=[("foo", ["a", "b", "c", "d"])])
        series = array.reindex(foo=["a", "b", "c"]).to_series()
>       assert series.index.names == ["foo"]
E       AssertionError: assert FrozenList([None]) == ['foo']
E         At index 0 diff: None != 'foo'
E         Use -v to get the full diff

tests\test_to_series_index.py:39: AssertionError
_______________________________________________ test_index_int_reindex ________________________________________________

    def test_index_int_reindex():
        array = xr.DataArray(np.arange(4), coords=[("foo", [1, 2, 3, 4])])
        series = array.reindex(foo=[1, 2, 3]).to_series()
>       assert series.index.names == ["foo"]
E       AssertionError: assert FrozenList([None]) == ['foo']
E         At index 0 diff: None != 'foo'
E         Use -v to get the full diff

tests\test_to_series_index.py:50: AssertionError
=============================================== short test summary info ===============================================
FAILED tests/test_to_series_index.py::test_index_str_reindex - AssertionError: assert FrozenList([None]) == ['foo']
FAILED tests/test_to_series_index.py::test_index_int_reindex - AssertionError: assert FrozenList([None]) == ['foo']
============================================= 2 failed, 4 passed in 0.61s =============================================

Version 0.17.0:

Index name is missing after reindex on both single level Index and MultiIndex.

================================================= test session starts =================================================
platform win32 -- Python 3.8.6, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
plugins: cov-2.11.1
collected 6 items

tests\test_to_series_index.py .F.F.F                                                                             [100%]

====================================================== FAILURES =======================================================
_______________________________________________ test_multiindex_reindex _______________________________________________

    def test_multiindex_reindex():
        array = xr.DataArray(
            np.arange(12).reshape((4, 3)),
            coords=[
                ("foo", ["a", "b", "c", "d"]),
                ("bar", ["x", "y", "z"]),
            ]
        )
        series = array.reindex(bar=["x", "y"]).to_series()
>       assert series.index.names == ["foo", "bar"]
E       AssertionError: assert FrozenList(['foo', None]) == ['foo', 'bar']
E         At index 1 diff: None != 'bar'
E         Use -v to get the full diff

tests\test_to_series_index.py:28: AssertionError
_______________________________________________ test_index_str_reindex ________________________________________________

    def test_index_str_reindex():
        array = xr.DataArray(np.arange(4), coords=[("foo", ["a", "b", "c", "d"])])
        series = array.reindex(foo=["a", "b", "c"]).to_series()
>       assert series.index.names == ["foo"]
E       AssertionError: assert FrozenList([None]) == ['foo']
E         At index 0 diff: None != 'foo'
E         Use -v to get the full diff

tests\test_to_series_index.py:39: AssertionError
_______________________________________________ test_index_int_reindex ________________________________________________

    def test_index_int_reindex():
        array = xr.DataArray(np.arange(4), coords=[("foo", [1, 2, 3, 4])])
        series = array.reindex(foo=[1, 2, 3]).to_series()
>       assert series.index.names == ["foo"]
E       AssertionError: assert FrozenList([None]) == ['foo']
E         At index 0 diff: None != 'foo'
E         Use -v to get the full diff

tests\test_to_series_index.py:50: AssertionError
=============================================== short test summary info ===============================================
FAILED tests/test_to_series_index.py::test_multiindex_reindex - AssertionError: assert FrozenList(['foo', None]) == [...
FAILED tests/test_to_series_index.py::test_index_str_reindex - AssertionError: assert FrozenList([None]) == ['foo']
FAILED tests/test_to_series_index.py::test_index_int_reindex - AssertionError: assert FrozenList([None]) == ['foo']
============================================= 3 failed, 3 passed in 0.61s =============================================

Environment:

Output of xr.show_versions() for 0.16.2

commit: None
python: 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: English_United States.1252
libhdf5: None
libnetcdf: None

xarray: 0.16.2
pandas: 1.2.3
numpy: 1.20.1
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 53.0.0
pip: 21.0.1
conda: None
pytest: 6.2.2
IPython: 7.21.0
sphinx: None

Output of xr.show_versions() for 0.17.0

commit: None
python: 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: English_United States.1252
libhdf5: None
libnetcdf: None

xarray: 0.17.0
pandas: 1.2.3
numpy: 1.20.1
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 53.0.0
pip: 21.0.1
conda: None
pytest: 6.2.2
IPython: 7.21.0
sphinx: None

@max-sixty
Copy link
Collaborator

Thanks for the excellent example @yhlam .

We'd definitely take a PR to fix this. We're also working on some larger index changes, though they are still some way off.

@sjvrijn
Copy link
Contributor

sjvrijn commented Feb 18, 2023

The example tests by @yhlam currently pass in xarray: 2023.2.1.dev7+g21d86450. Using git bisect, it seems like this issue was fixed as part of the explicit indexes PR #5692. I guess that means this issue can be closed?

@dcherian
Copy link
Contributor

Thanks @sjvrijn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants