Skip to content
forked from pydata/xarray

Commit

Permalink
Merge branch 'main' into warn-nd-index-var
Browse files Browse the repository at this point in the history
* main:
  Remove hue_style from plot1d docstring (pydata#7925)
  Add new what's new section (pydata#7986)
  Release summary for v2023.07.0 (pydata#7979)
  Improve explanation in example "Working with Multidimensional Coordinates" (pydata#7984)
  Fix typo in zarr.py (pydata#7983)
  Examples added to docstrings  (pydata#7936)
  [pre-commit.ci] pre-commit autoupdate (pydata#7973)
  Skip broken tests on python 3.11 and windows (pydata#7972)
  Use another repository for upstream testing (pydata#7970)
  Move absolute path finder from open_mfdataset to own function (pydata#7968)
  ensure no forward slashes in names for HDF5-based backends (pydata#7953)
  Chunked array docs (pydata#7951)
  [pre-commit.ci] pre-commit autoupdate (pydata#7959)
  manually unshallow the repository on RTD (pydata#7961)
  Update minimum version of typing extensions in pre-commit (pydata#7960)
  Docstring examples (pydata#7881)
  • Loading branch information
dcherian committed Jul 16, 2023
2 parents f059e50 + a47ff4e commit 65c658b
Show file tree
Hide file tree
Showing 22 changed files with 1,178 additions and 80 deletions.
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ repos:
- id: absolufy-imports
name: absolufy-imports
files: ^xarray/
- repo: https://github.com/charliermarsh/ruff-pre-commit
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: 'v0.0.275'
rev: 'v0.0.277'
hooks:
- id: ruff
args: ["--fix"]
Expand Down Expand Up @@ -47,7 +47,7 @@ repos:
types-pkg_resources,
types-PyYAML,
types-pytz,
typing-extensions==3.10.0.0,
typing-extensions>=4.1.0,
numpy,
]
- repo: https://github.com/citation-file-format/cff-converter-python
Expand Down
1 change: 1 addition & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ build:
jobs:
post_checkout:
- (git --no-pager log --pretty="tformat:%s" -1 | grep -vqF "[skip-rtd]") || exit 183
- git fetch --unshallow || true
pre_install:
- git update-index --assume-unchanged doc/conf.py ci/requirements/doc.yml

Expand Down
2 changes: 1 addition & 1 deletion ci/install-upstream-wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ conda uninstall -y --force \
xarray
# to limit the runtime of Upstream CI
python -m pip install \
-i https://pypi.anaconda.org/scipy-wheels-nightly/simple \
-i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple \
--no-deps \
--pre \
--upgrade \
Expand Down
1 change: 1 addition & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,7 @@
"dask": ("https://docs.dask.org/en/latest", None),
"cftime": ("https://unidata.github.io/cftime", None),
"sparse": ("https://sparse.pydata.org/en/latest/", None),
"cubed": ("https://tom-e-white.com/cubed/", None),
}


Expand Down
2 changes: 1 addition & 1 deletion doc/examples/multidimensional-coords.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, the _logical coordinates_ are `x` and `y`, while the _physical coordinates_ are `xc` and `yc`, which represent the latitudes and longitude of the data."
"In this example, the _logical coordinates_ are `x` and `y`, while the _physical coordinates_ are `xc` and `yc`, which represent the longitudes and latitudes of the data."
]
},
{
Expand Down
102 changes: 102 additions & 0 deletions doc/internals/chunked-arrays.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
.. currentmodule:: xarray

.. _internals.chunkedarrays:

Alternative chunked array types
===============================

.. warning::

This is a *highly* experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker <https://github.com/pydata/xarray/issues>`_.
In particular see discussion on `xarray issue #6807 <https://github.com/pydata/xarray/issues/6807>`_

Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface.
This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands.
In particular xarray also supports wrapping :py:class:`cubed.Array` objects
(see `Cubed's documentation <https://tom-e-white.com/cubed/>`_ and the `cubed-xarray package <https://github.com/xarray-contrib/cubed-xarray>`_).

The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over
the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually
implements the handling of processing all of the chunks.

Chunked array methods and "core operations"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A chunked array needs to meet all the :ref:`requirements for normal duck arrays <internals.duckarrays.requirements>`, but must also
implement additional features.

Chunked arrays have additional attributes and methods, such as ``.chunks`` and ``.rechunk``.
Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known
as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``.

The core operations are generalizations of functions first implemented in :py:mod:`dask.array`.
The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the
``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`,
whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`.

In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the
corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`,
also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the
API of the** :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` **abstract base class**. Note that chunked array
methods are also currently dispatched using this class.

Chunked array creation is also handled by this class. As chunked array objects have a one-to-one correspondence with
in-memory numpy arrays, it should be possible to create a chunked array from a numpy array by passing the desired
chunking pattern to an implementation of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint.from_array``.

.. note::

The :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class is mostly just acting as a
namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard
for chunked array types which codified this structure, making the entrypoint system unnecessary.

.. currentmodule:: xarray.core.parallelcompat

.. autoclass:: xarray.core.parallelcompat.ChunkManagerEntrypoint
:members:

Registering a new ChunkManagerEntrypoint subclass
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an
entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of
:py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`.


To register a new entrypoint you need to add an entry to the ``setup.cfg`` like this::

[options.entry_points]
xarray.chunkmanagers =
dask = xarray.core.daskmanager:DaskManager

See also `cubed-xarray <https://github.com/xarray-contrib/cubed-xarray>`_ for another example.

To check that the entrypoint has worked correctly, you may find it useful to display the available chunkmanagers using
the internal function :py:func:`~xarray.core.parallelcompat.list_chunkmanagers`.

.. autofunction:: list_chunkmanagers


User interface
~~~~~~~~~~~~~~

Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways:

#. By manually passing the array type to the :py:class:`~xarray.DataArray` constructor, see the examples for :ref:`numpy-like arrays <userguide.duckarrays>`,

#. Calling :py:meth:`~xarray.DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``,

#. Calling :py:func:`~xarray.open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``.

The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict.
The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'``
if Dask is installed, otherwise it defaults to whichever chunkmanager is registered if only one is registered.
If multiple chunkmanagers are registered it will raise an error by default.

Parallel processing without chunks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page
is theoretically required. Such an array type (e.g. `Ramba <https://github.com/Python-for-HPC/ramba>`_ or
`Arkouda <https://github.com/Bears-R-Us/arkouda>`_) could be wrapped using xarray's existing support for
:ref:`numpy-like "duck" arrays <userguide.duckarrays>`.
2 changes: 2 additions & 0 deletions doc/internals/duck-arrays-integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Integrating with duck arrays
Xarray can wrap custom numpy-like arrays (":term:`duck array`\s") - see the :ref:`user guide documentation <userguide.duckarrays>`.
This page is intended for developers who are interested in wrapping a new custom array type with xarray.

.. _internals.duckarrays.requirements:

Duck array requirements
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
1 change: 1 addition & 0 deletions doc/internals/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The pages in this section are intended for:

variable-objects
duck-arrays-integration
chunked-arrays
extending-xarray
zarr-encoding-spec
how-to-add-new-backend
2 changes: 1 addition & 1 deletion doc/user-guide/duckarrays.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Some numpy-like array types that xarray already has some support for:

For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that
described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require
slightly different user code (e.g. calling ``.chunk`` or ``.compute``).
slightly different user code (e.g. calling ``.chunk`` or ``.compute``). See the docs on :ref:`wrapping chunked arrays <internals.chunkedarrays>`.

Why "duck"?
-----------
Expand Down
32 changes: 30 additions & 2 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ What's New
np.random.seed(123456)
.. _whats-new.2023.06.1:
.. _whats-new.2023.07.1:

v2023.06.1 (unreleased)
v2023.07.1 (unreleased)
-----------------------

New Features
Expand All @@ -29,17 +29,45 @@ Breaking changes

Deprecations
~~~~~~~~~~~~
- `hue_style` is being deprecated for scatter plots. (:issue:`7907`, :pull:`7925`).
By `Jimmy Westling <https://github.com/illviljan>`_.

Bug fixes
~~~~~~~~~


Documentation
~~~~~~~~~~~~~


Internal Changes
~~~~~~~~~~~~~~~~


v2023.07.0 (July 11, 2023)
--------------------------

This release brings improvements to the documentation on wrapping numpy-like arrays, improved docstrings, and bug fixes.

Bug fixes
~~~~~~~~~

- Ensure no forward slashes in variable and dimension names for HDF5-based engines.
(:issue:`7943`, :pull:`7953`) By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.

Documentation
~~~~~~~~~~~~~

- Added examples to docstrings of :py:meth:`Dataset.tail`, :py:meth:`Dataset.head`, :py:meth:`Dataset.dropna`,
:py:meth:`Dataset.ffill`, :py:meth:`Dataset.bfill`, :py:meth:`Dataset.set_coords`, :py:meth:`Dataset.reset_coords`
(:issue:`6793`, :pull:`7936`) By `Harshitha <https://github.com/harshitha1201>`_ .
- Added page on wrapping chunked numpy-like arrays as alternatives to dask arrays.
(:pull:`7951`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Expanded the page on wrapping numpy-like "duck" arrays.
(:pull:`7911`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Added examples to docstrings of :py:meth:`Dataset.isel`, :py:meth:`Dataset.reduce`, :py:meth:`Dataset.argmin`,
:py:meth:`Dataset.argmax` (:issue:`6793`, :pull:`7881`)
By `Harshitha <https://github.com/harshitha1201>`_ .

Internal Changes
~~~~~~~~~~~~~~~~
Expand Down
40 changes: 7 additions & 33 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import os
from collections.abc import Hashable, Iterable, Mapping, MutableMapping, Sequence
from functools import partial
from glob import glob
from io import BytesIO
from numbers import Number
from typing import (
Expand All @@ -21,7 +20,12 @@

from xarray import backends, conventions
from xarray.backends import plugins
from xarray.backends.common import AbstractDataStore, ArrayWriter, _normalize_path
from xarray.backends.common import (
AbstractDataStore,
ArrayWriter,
_find_absolute_paths,
_normalize_path,
)
from xarray.backends.locks import _get_scheduler
from xarray.core import indexing
from xarray.core.combine import (
Expand Down Expand Up @@ -967,37 +971,7 @@ def open_mfdataset(
.. [1] https://docs.xarray.dev/en/stable/dask.html
.. [2] https://docs.xarray.dev/en/stable/dask.html#chunking-and-performance
"""
if isinstance(paths, str):
if is_remote_uri(paths) and engine == "zarr":
try:
from fsspec.core import get_fs_token_paths
except ImportError as e:
raise ImportError(
"The use of remote URLs for opening zarr requires the package fsspec"
) from e

fs, _, _ = get_fs_token_paths(
paths,
mode="rb",
storage_options=kwargs.get("backend_kwargs", {}).get(
"storage_options", {}
),
expand=False,
)
tmp_paths = fs.glob(fs._strip_protocol(paths)) # finds directories
paths = [fs.get_mapper(path) for path in tmp_paths]
elif is_remote_uri(paths):
raise ValueError(
"cannot do wild-card matching for paths that are remote URLs "
f"unless engine='zarr' is specified. Got paths: {paths}. "
"Instead, supply paths as an explicit list of strings."
)
else:
paths = sorted(glob(_normalize_path(paths)))
elif isinstance(paths, os.PathLike):
paths = [os.fspath(paths)]
else:
paths = [os.fspath(p) if isinstance(p, os.PathLike) else p for p in paths]
paths = _find_absolute_paths(paths, engine=engine, **kwargs)

if not paths:
raise OSError("no files to open")
Expand Down
Loading

0 comments on commit 65c658b

Please sign in to comment.