Merge branch 'main' into pre-commit-ci-update-config

* main: add backend intro and how-to diagram (#9175) Fix copybutton for multi line examples in double digit ipython cells (#9264) Update signature for _arrayfunction.__array__ (#9237) Add encode_cf_datetime benchmark (#9262) groupby, resample: Deprecate some positional args (#9236) Delete ``base`` and ``loffset`` parameters to resample (#9233) Update dropna docstring (#9257) Grouper, Resampler as public api (#8840) Fix mypy on main (#9252) fix fallback isdtype method (#9250) Enable pandas type checking (#9213) Per-variable specification of boolean parameters in open_dataset (#9218) test push Added a space to the documentation (#9247) Fix typing for test_plot.py (#9234)
pydata · Jul 22, 2024 · c836fbd · c836fbd
2 parents 25bf152 + 95e67b6
commit c836fbd
Show file tree

Hide file tree

Showing 53 changed files with 1,312 additions and 1,067 deletions.
diff --git a/asv_bench/benchmarks/coding.py b/asv_bench/benchmarks/coding.py
@@ -0,0 +1,18 @@
+import numpy as np
+
+import xarray as xr
+
+from . import parameterized
+
+
+@parameterized(["calendar"], [("standard", "noleap")])
+class EncodeCFDatetime:
+    def setup(self, calendar):
+        self.units = "days since 2000-01-01"
+        self.dtype = np.dtype("int64")
+        self.times = xr.date_range(
+            "2000", freq="D", periods=10000, calendar=calendar
+        ).values
+
+    def time_encode_cf_datetime(self, calendar):
+        xr.coding.times.encode_cf_datetime(self.times, self.units, calendar, self.dtype)
diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst
@@ -693,3 +693,7 @@
 
    coding.times.CFTimedeltaCoder
    coding.times.CFDatetimeCoder
+
+   core.groupers.Grouper
+   core.groupers.Resampler
+   core.groupers.EncodedGroups
diff --git a/doc/api.rst b/doc/api.rst
@@ -803,6 +803,18 @@ DataArray
    DataArrayGroupBy.dims
    DataArrayGroupBy.groups
 
+Grouper Objects
+---------------
+
+.. currentmodule:: xarray.core
+
+.. autosummary::
+   :toctree: generated/
+
+   groupers.BinGrouper
+   groupers.UniqueGrouper
+   groupers.TimeResampler
+
 
 Rolling objects
 ===============
@@ -1028,17 +1040,20 @@ DataArray
 Accessors
 =========
 
-.. currentmodule:: xarray
+.. currentmodule:: xarray.core
 
 .. autosummary::
    :toctree: generated/
 
-   core.accessor_dt.DatetimeAccessor
-   core.accessor_dt.TimedeltaAccessor
-   core.accessor_str.StringAccessor
+   accessor_dt.DatetimeAccessor
+   accessor_dt.TimedeltaAccessor
+   accessor_str.StringAccessor
+
 
 Custom Indexes
 ==============
+.. currentmodule:: xarray
+
 .. autosummary::
    :toctree: generated/
 

diff --git a/doc/conf.py b/doc/conf.py
@@ -97,7 +97,7 @@
 }
 
 # sphinx-copybutton configurations
-copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: "
+copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.{3,}: | {5,8}: "
 copybutton_prompt_is_regexp = True
 
 # nbsphinx configurations
@@ -158,6 +158,8 @@
     "Variable": "~xarray.Variable",
     "DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy",
     "DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy",
+    "Grouper": "~xarray.core.groupers.Grouper",
+    "Resampler": "~xarray.core.groupers.Resampler",
     # objects without namespace: numpy
     "ndarray": "~numpy.ndarray",
     "MaskedArray": "~numpy.ma.MaskedArray",
@@ -169,6 +171,7 @@
     "CategoricalIndex": "~pandas.CategoricalIndex",
     "TimedeltaIndex": "~pandas.TimedeltaIndex",
     "DatetimeIndex": "~pandas.DatetimeIndex",
+    "IntervalIndex": "~pandas.IntervalIndex",
     "Series": "~pandas.Series",
     "DataFrame": "~pandas.DataFrame",
     "Categorical": "~pandas.Categorical",

diff --git a/doc/internals/how-to-add-new-backend.rst b/doc/internals/how-to-add-new-backend.rst
@@ -4,7 +4,7 @@ How to add a new backend
 ------------------------
 
 Adding a new backend for read support to Xarray does not require
-to integrate any code in Xarray; all you need to do is:
+one to integrate any code in Xarray; all you need to do is:
 
 - Create a class that inherits from Xarray :py:class:`~xarray.backends.BackendEntrypoint`
   and implements the method ``open_dataset`` see :ref:`RST backend_entrypoint`

diff --git a/doc/user-guide/groupby.rst b/doc/user-guide/groupby.rst
@@ -1,3 +1,5 @@
+.. currentmodule:: xarray
+
 .. _groupby:
 
 GroupBy: Group and Bin Data
@@ -15,19 +17,20 @@ __ https://www.jstatsoft.org/v40/i01/paper
 - Apply some function to each group.
 - Combine your groups back into a single data object.
 
-Group by operations work on both :py:class:`~xarray.Dataset` and
-:py:class:`~xarray.DataArray` objects. Most of the examples focus on grouping by
+Group by operations work on both :py:class:`Dataset` and
+:py:class:`DataArray` objects. Most of the examples focus on grouping by
 a single one-dimensional variable, although support for grouping
 over a multi-dimensional variable has recently been implemented. Note that for
 one-dimensional data, it is usually faster to rely on pandas' implementation of
 the same pipeline.
 
 .. tip::
 
-   To substantially improve the performance of GroupBy operations, particularly
-   with dask `install the flox package <https://flox.readthedocs.io>`_. flox
+   `Install the flox package <https://flox.readthedocs.io>`_ to substantially improve the performance
+   of GroupBy operations, particularly with dask. flox
    `extends Xarray's in-built GroupBy capabilities <https://flox.readthedocs.io/en/latest/xarray.html>`_
-   by allowing grouping by multiple variables, and lazy grouping by dask arrays. If installed, Xarray will automatically use flox by default.
+   by allowing grouping by multiple variables, and lazy grouping by dask arrays.
+   If installed, Xarray will automatically use flox by default.
 
 Split
 ~~~~~
@@ -87,7 +90,7 @@ Binning
 Sometimes you don't want to use all the unique values to determine the groups
 but instead want to "bin" the data into coarser groups. You could always create
 a customized coordinate, but xarray facilitates this via the
-:py:meth:`~xarray.Dataset.groupby_bins` method.
+:py:meth:`Dataset.groupby_bins` method.
 
 .. ipython:: python
 
@@ -110,7 +113,7 @@ Apply
 ~~~~~
 
 To apply a function to each group, you can use the flexible
-:py:meth:`~xarray.core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
+:py:meth:`core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
 concatenated back together along the group axis:
 
 .. ipython:: python
@@ -121,8 +124,8 @@ concatenated back together along the group axis:
 
     arr.groupby("letters").map(standardize)
 
-GroupBy objects also have a :py:meth:`~xarray.core.groupby.DatasetGroupBy.reduce` method and
-methods like :py:meth:`~xarray.core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
+GroupBy objects also have a :py:meth:`core.groupby.DatasetGroupBy.reduce` method and
+methods like :py:meth:`core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
 aggregation function:
 
 .. ipython:: python
@@ -183,7 +186,7 @@ Iterating and Squeezing
 Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
 a GroupBy object. This behaviour is being removed.
 You can always squeeze explicitly later with the Dataset or DataArray
-:py:meth:`~xarray.DataArray.squeeze` methods.
+:py:meth:`DataArray.squeeze` methods.
 
 .. ipython:: python
 
@@ -217,7 +220,7 @@ __ https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dime
     da.groupby("lon").map(lambda x: x - x.mean(), shortcut=False)
 
 Because multidimensional groups have the ability to generate a very large
-number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
+number of bins, coarse-binning via :py:meth:`Dataset.groupby_bins`
 may be desirable:
 
 .. ipython:: python
@@ -232,3 +235,71 @@ applying your function, and then unstacking the result:
 
     stacked = da.stack(gridcell=["ny", "nx"])
     stacked.groupby("gridcell").sum(...).unstack("gridcell")
+
+.. _groupby.groupers:
+
+Grouper Objects
+~~~~~~~~~~~~~~~
+
+Both ``groupby_bins`` and ``resample`` are specializations of the core ``groupby`` operation for binning,
+and time resampling. Many problems demand more complex GroupBy application: for example, grouping by multiple
+variables with a combination of categorical grouping, binning, and resampling; or more specializations like
+spatial resampling; or more complex time grouping like special handling of seasons, or the ability to specify
+custom seasons. To handle these use-cases and more, Xarray is evolving to providing an
+extension point using ``Grouper`` objects.
+
+.. tip::
+
+   See the `grouper design`_ doc for more detail on the motivation and design ideas behind
+   Grouper objects.
+
+.. _grouper design: https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md
+
+For now Xarray provides three specialized Grouper objects:
+
+1. :py:class:`groupers.UniqueGrouper` for categorical grouping
+2. :py:class:`groupers.BinGrouper` for binned grouping
+3. :py:class:`groupers.TimeResampler` for resampling along a datetime coordinate
+
+These provide functionality identical to the existing ``groupby``, ``groupby_bins``, and ``resample`` methods.
+That is,
+
+.. code-block:: python
+
+    ds.groupby("x")
+
+is identical to
+
+.. code-block:: python
+
+    from xarray.groupers import UniqueGrouper
+
+    ds.groupby(x=UniqueGrouper())
+
+; and
+
+.. code-block:: python
+
+    ds.groupby_bins("x", bins=bins)
+
+is identical to
+
+.. code-block:: python
+
+    from xarray.groupers import BinGrouper
+
+    ds.groupby(x=BinGrouper(bins))
+
+and
+
+.. code-block:: python
+
+    ds.resample(time="ME")
+
+is identical to
+
+.. code-block:: python
+
+    from xarray.groupers import TimeResampler
+
+    ds.resample(time=TimeResampler("ME"))
diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst
@@ -19,6 +19,81 @@ format (recommended).
 
     np.random.seed(123456)
 
+You can `read different types of files <https://docs.xarray.dev/en/stable/user-guide/io.html>`_
+in `xr.open_dataset` by specifying the engine to be used:
+
+.. ipython:: python
+    :okexcept:
+    :suppress:
+
+    import xarray as xr
+
+    xr.open_dataset("my_file.grib", engine="cfgrib")
+
+The "engine" provides a set of instructions that tells xarray how
+to read the data and pack them into a `dataset` (or `dataarray`).
+These instructions are stored in an underlying "backend".
+
+Xarray comes with several backends that cover many common data formats.
+Many more backends are available via external libraries, or you can `write your own <https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html>`_.
+This diagram aims to help you determine - based on the format of the file you'd like to read -
+which type of backend you're using and how to use it.
+
+Text and boxes are clickable for more information.
+Following the diagram is detailed information on many popular backends.
+You can learn more about using and developing backends in the
+`Xarray tutorial JupyterBook <https://tutorial.xarray.dev/advanced/backends/backends.html>`_.
+
+.. mermaid::
+    :alt: Flowchart illustrating how to choose the right backend engine to read your data
+
+    flowchart LR
+        built-in-eng["""Is your data stored in one of these formats?
+            - netCDF4 (<code>netcdf4</code>)
+            - netCDF3 (<code>scipy</code>)
+            - Zarr (<code>zarr</code>)
+            - DODS/OPeNDAP (<code>pydap</code>)
+            - HDF5 (<code>h5netcdf</code>)
+            """]
+
+        built-in("""You're in luck! Xarray bundles a backend for this format.
+            Open data using <code>xr.open_dataset()</code>. We recommend
+            always setting the engine you want to use.""")
+
+        installed-eng["""One of these formats?
+            - <a href='https://github.com/ecmwf/cfgrib'>GRIB (<code>cfgrib</code>)
+            - <a href='https://tiledb-inc.github.io/TileDB-CF-Py/documentation/index.html'>TileDB (<code>tiledb</code>)
+            - <a href='https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray'>GeoTIFF, JPEG-2000, ESRI-hdf (<code>rioxarray</code>, via GDAL)
+            - <a href='https://www.bopen.eu/xarray-sentinel-open-source-library/'>Sentinel-1 SAFE (<code>xarray-sentinel</code>)
+            """]
+
+        installed("""Install the package indicated in parentheses to your
+            Python environment. Restart the kernel and use
+            <code>xr.open_dataset(files, engine='rioxarray')</code>.""")
+
+        other("""Ask around to see if someone in your data community
+            has created an Xarray backend for your data type.
+            If not, you may need to create your own or consider
+            exporting your data to a more common format.""")
+
+        built-in-eng -->|Yes| built-in
+        built-in-eng -->|No| installed-eng
+
+        installed-eng -->|Yes| installed
+        installed-eng -->|No| other
+
+        click built-in-eng "https://docs.xarray.dev/en/stable/getting-started-guide/faq.html#how-do-i-open-format-x-file-as-an-xarray-dataset"
+        click other "https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html"
+
+        classDef quesNodefmt fill:#9DEEF4,stroke:#206C89,text-align:left
+        class built-in-eng,installed-eng quesNodefmt
+
+        classDef ansNodefmt fill:#FFAA05,stroke:#E37F17,text-align:left,white-space:nowrap
+        class built-in,installed,other ansNodefmt
+
+        linkStyle default font-size:20pt,color:#206C89
+
+
 .. _io.netcdf:
 
 netCDF

diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst
@@ -221,7 +221,7 @@ complete examples, please consult the relevant documentation.*
             combined_ds
 
     lazy
-        Lazily-evaluated operations do not load data into memory until necessary.Instead of doing calculations
+        Lazily-evaluated operations do not load data into memory until necessary. Instead of doing calculations
         right away, xarray lets you plan what calculations you want to do, like finding the
         average temperature in a dataset.This planning is called "lazy evaluation." Later, when
         you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!"

diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -22,6 +22,15 @@ v2024.06.1 (unreleased)
 
 New Features
 ~~~~~~~~~~~~
+- Introduce new :py:class:`groupers.UniqueGrouper`, :py:class:`groupers.BinGrouper`, and
+  :py:class:`groupers.TimeResampler` objects as a step towards supporting grouping by
+  multiple variables. See the `docs <groupby.groupers_>` and the
+  `grouper design doc <https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md>`_ for more.
+  (:issue:`6610`, :pull:`8840`).
+  By `Deepak Cherian <https://github.com/dcherian>`_.
+- Allow per-variable specification of ``mask_and_scale``, ``decode_times``, ``decode_timedelta``
+  ``use_cftime`` and ``concat_characters`` params in :py:func:`~xarray.open_dataset`  (:pull:`9218`).
+  By `Mathijs Verhaegh <https://github.com/Ostheer>`_.
 - Allow chunking for arrays with duplicated dimension names (:issue:`8759`, :pull:`9099`).
   By `Martin Raspaud <https://github.com/mraspaud>`_.
 - Extract the source url from fsspec objects (:issue:`9142`, :pull:`8923`).
@@ -33,6 +42,11 @@ New Features
 
 Breaking changes
 ~~~~~~~~~~~~~~~~
+- The ``base`` and ``loffset`` parameters to :py:meth:`Dataset.resample` and :py:meth:`DataArray.resample`
+  is now removed. These parameters has been deprecated since v2023.03.0. Using the
+  ``origin`` or ``offset`` parameters is recommended as a replacement for using
+  the ``base`` parameter and using time offset arithmetic is recommended as a
+  replacement for using the ``loffset`` parameter.
 
 
 Deprecations
@@ -70,15 +84,20 @@ Bug fixes
 Documentation
 ~~~~~~~~~~~~~
 
-- Adds a flow-chart diagram to help users navigate help resources (`Discussion #8990 <https://github.com/pydata/xarray/discussions/8990>`_).
+- Adds intro to backend section of docs, including a flow-chart to navigate types of backends (:pull:`9175`).
+  By `Jessica Scheick <https://github.com/jessicas11>`_.
+- Adds a flow-chart diagram to help users navigate help resources (`Discussion #8990 <https://github.com/pydata/xarray/discussions/8990>`_, :pull:`9147`).
   By `Jessica Scheick <https://github.com/jessicas11>`_.
 - Improvements to Zarr & chunking docs (:pull:`9139`, :pull:`9140`, :pull:`9132`)
   By `Maximilian Roos <https://github.com/max-sixty>`_.
-
+- Fix copybutton for multi line examples and double digit ipython cell numbers (:pull:`9264`).
+  By `Moritz Schreiber <https://github.com/mosc9575>`_.
 
 Internal Changes
 ~~~~~~~~~~~~~~~~
 
+- Enable typing checks of pandas (:pull:`9213`).
+  By `Michael Niklas <https://github.com/headtr1ck>`_.
 
 .. _whats-new.2024.06.0: