Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby should not squeeze out dimensions #2157

Closed
aurghs opened this issue May 18, 2018 · 4 comments · Fixed by #9280
Closed

groupby should not squeeze out dimensions #2157

aurghs opened this issue May 18, 2018 · 4 comments · Fixed by #9280

Comments

@aurghs
Copy link
Collaborator

aurghs commented May 18, 2018

Code Sample

arr = xr.DataArray(
    np.ones(3), 
    dims=('x',), 
    coords={
        'x': ('x', np.array([1, 3, 6])),
    }
)
list(arr.groupby('x'))

[(1, <xarray.DataArray ()>
  array(1.)
  Coordinates:
      x        int64 1), 
(3, <xarray.DataArray ()>
  array(1.)
  Coordinates:
      x        int64 3), 
(6, <xarray.DataArray ()>
  array(1.)
  Coordinates:
      x        int64 6)]

Problem description

The dimension x disappear. I have done some tests and it seems that this problem raise only with strictly ascending coordinates.
For example in this case it works correctly:

arr = xr.DataArray(
    np.ones(3), 
    dims=('x',), 
    coords={
        'x': ('x', np.array([2, 1, 0])),
    }
)
list(arr.groupby('x'))

[(0, <xarray.DataArray (x: 1)>
  array([1.])
  Coordinates:
    * x        (x) int64 0), 
(1, <xarray.DataArray (x: 1)>
  array([1.])
  Coordinates:
    * x        (x) int64 1), 
(2, <xarray.DataArray (x: 1)>
  array([1.])
  Coordinates:
    * x        (x) int64 2)]

Expected Output

arr = xr.DataArray(
    np.ones(3), 
    dims=('x',), 
    coords={
        'x': ('x', np.array([1, 3, 6])),
    }
)
list(arr.groupby('x'))

[(1, <xarray.DataArray (x: 1)>
  ar1ay([1.])
  Coordinates:
    * x        (x) int64 1), 
(3, <xarray.DataArray (x: 1)>
  array([1.])
  Coordinates:
    * x        (x) int64 3),
 (6, <xarray.DataArray (x: 1)>
  array([1.])
  Coordinates:
    * x        (x) int64 6)]

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-41-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.4
pandas: 0.22.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: 1.3.1
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.17.4
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: None
setuptools: 38.4.1
pip: 10.0.1
conda: None
pytest: 3.5.1
IPython: 6.2.1
sphinx: 1.7.4

@shoyer
Copy link
Member

shoyer commented May 18, 2018

This was intentional behavior, but I agree that it is probably a mistake.

You can disable it by setting squeeze=False in groupby(), e.g.,

arr = xr.DataArray(
    np.ones(3), 
    dims=('x',), 
    coords={
        'x': ('x', np.array([1, 3, 6])),
    }
)
list(arr.groupby('x', squeeze=False))

[(1, <xarray.DataArray (x: 1)>
  array([1.])
  Coordinates:
    * x        (x) int64 1), (3, <xarray.DataArray (x: 1)>
  array([1.])
  Coordinates:
    * x        (x) int64 3), (6, <xarray.DataArray (x: 1)>
  array([1.])
  Coordinates:
    * x        (x) int64 6)]

This default behavior can be convenient sometimes because of fewer extra dimensions to worry about. In principle, squeezing makes sense if the grouped coordinate has all unique values, but here we aren't even doing that (I don't know why).

I have two suggested fixes:

  • squeeze=True should only rely on unique values. Otherwise this really is nearly impossible to predict.
  • We should deprecate squeeze=True as the default value, and change this default behavior in the next major release.

@stale stale bot added the stale label Sep 26, 2020
@shoyer shoyer removed the stale label Sep 26, 2020
@shoyer shoyer changed the title dimension disappears when gourping by strictly ascending dimensional coordinate groupby() should not squeeze out dimensions Sep 26, 2020
@shoyer shoyer changed the title groupby() should not squeeze out dimensions groupby should not squeeze out dimensions Sep 26, 2020
@pydata pydata deleted a comment from stale bot Sep 26, 2020
dcherian added a commit to dcherian/xarray that referenced this issue Dec 2, 2023
dcherian added a commit to dcherian/xarray that referenced this issue Dec 2, 2023
dcherian added a commit to dcherian/xarray that referenced this issue Jan 4, 2024
commit 0a0f800
Merge: 33c8033 41d33f5
Author: Deepak Cherian <dcherian@users.noreply.github.com>
Date:   Tue Jan 2 20:42:51 2024 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

commit 33c8033
Author: Deepak Cherian <deepak@cherian.net>
Date:   Tue Jan 2 20:40:42 2024 -0700

    Don't skip for resampling

commit d7be352
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Jan 3 03:24:13 2024 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit d13fa0e
Author: Deepak Cherian <dcherian@users.noreply.github.com>
Date:   Tue Jan 2 20:23:43 2024 -0700

    Apply suggestions from code review

    Co-authored-by: Michael Niklas  <mick.niklas@gmail.com>

commit dd6ea53
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 19:29:40 2023 -0700

    Silence more warnings

commit 44e5a41
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 19:21:06 2023 -0700

    minimize test mods

commit 94c1c1f
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 18:55:46 2023 -0700

    Add tests for pydata#8263

commit 0ab4eb6
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 18:47:41 2023 -0700

    Fix typing

commit a064430
Merge: d6a3f2d 03ec3cb
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 18:47:04 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main:
      Fix mypy type ignore (pydata#8564)
      Support for the new compression arguments. (pydata#7551)
      FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
      Adapt map_blocks to use new Coordinates API (pydata#8560)
      add xeofs to ecosystem.rst (pydata#8561)
      Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
      Generalize cumulative reduction (scan) to non-dask types (pydata#8019)

commit d6a3f2d
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 18:46:50 2023 -0700

    Fix generator for aggregations

commit 97f1695
Author: Deepak Cherian <deepak@cherian.net>
Date:   Tue Dec 19 10:58:11 2023 -0700

    Fix docs

commit 5b33b98
Author: Deepak Cherian <deepak@cherian.net>
Date:   Sun Dec 17 20:35:53 2023 -0700

    fix whats-new

commit 80b2b36
Author: Deepak Cherian <deepak@cherian.net>
Date:   Sun Dec 17 20:26:17 2023 -0700

    Reduce more warnings

commit 5f6f4ea
Merge: a57d4ae 2971994
Author: Deepak Cherian <deepak@cherian.net>
Date:   Sat Dec 16 20:33:13 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main: (26 commits)
      Filter null values before plotting (pydata#8535)
      Update concat.py (pydata#8538)
      Add getitem to array protocol (pydata#8406)
      Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
      Filter out doctest warning (pydata#8539)
      Bump actions/setup-python from 4 to 5 (pydata#8540)
      Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
      Add Cumulative aggregation (pydata#8512)
      dev whats-new
      Whats-new for 2023.12.0 (pydata#8532)
      explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
      Add `eval` method to Dataset (pydata#7163)
      Deprecate ds.dims returning dict (pydata#8500)
      test and fix empty xindexes repr (pydata#8521)
      Remove PR labeler bot (pydata#8525)
      Hypothesis strategy for generating Variable objects (pydata#8404)
      Use numbagg for `rolling` methods (pydata#8493)
      Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
      fix RTD docs build (pydata#8519)
      Fix type of `.assign_coords` (pydata#8495)
      ...

commit a57d4ae
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 21:36:04 2023 -0700

    Test one more warning

commit bf8139d
Author: Deepak Cherian <dcherian@users.noreply.github.com>
Date:   Fri Dec 1 21:33:45 2023 -0700

    Update xarray/tests/test_groupby.py

commit 4e9a063
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 21:10:14 2023 -0700

    Set squeeze=None for Dataset too

commit c2e576e
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 20:54:17 2023 -0700

    Fix first, last

commit 6d8e822
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 20:46:21 2023 -0700

    better warning

commit 62c334b
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 20:45:17 2023 -0700

    silence warnings

commit b7805a8
Author: dcherian <deepak@cherian.net>
Date:   Tue Aug 15 10:54:25 2023 -0600

    Deprecate `squeeze` in GroupBy.

    Closes pydata#2157
dcherian added a commit to dcherian/xarray that referenced this issue Jul 26, 2024
dcherian added a commit to dcherian/xarray that referenced this issue Jul 26, 2024
dcherian added a commit to dcherian/xarray that referenced this issue Jul 26, 2024
dcherian added a commit to dcherian/xarray that referenced this issue Jul 26, 2024
dcherian added a commit that referenced this issue Jul 26, 2024
* Fully deprecate squeeze kwarg to groupby

Closes #9279
Closes #1460
Closes #2157

* Fix doctests

* Update xarray/core/groupby.py

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

* Fix whats-new

* Update doc/whats-new.rst

---------

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
@Sibgatulin
Copy link
Contributor

Not sure if this is a good issue for it. If I am not mistaken, squeeze is deprecated not only for DataArray but also for Dataset? If so, perhaps the documentation of the latter should be updated too. Besides not mentioning deprecation, it currently also states incorrect default value for squeeze.

@max-sixty
Copy link
Collaborator

Thanks @Sibgatulin . A PR to remove it from the docs would be very welcome (marking as deprecated is fine too, but often easier to just remove it from the docs once...)

Sibgatulin added a commit to Sibgatulin/xarray that referenced this issue Oct 15, 2024
As mentioned in pydata#2157, the docstring of `Dataset.groupby` does not
reflect deprecation of squeeze (as the docstring of `DataArray.groupby`
does) and states an incorrect default value.
@Sibgatulin
Copy link
Contributor

Thanks for quick reply @max-sixty. I opened a tiny PR to fix the docstring.
I hesitated dropping squeeze from the decstring as it would then diverge from the function signature. And changing the latter would be, perhaps, a fair bit more invasive.

max-sixty pushed a commit that referenced this issue Oct 15, 2024
As mentioned in #2157, the docstring of `Dataset.groupby` does not
reflect deprecation of squeeze (as the docstring of `DataArray.groupby`
does) and states an incorrect default value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants