Skip to content

Commit

Permalink
Rename combine functions (#3043)
Browse files Browse the repository at this point in the history
* Renamed combine functions in code

* Renamed combine functions in docs

* pep8 fixes

* Fixed mistake in docstring

* Removed trailing whitespace in error messages
  • Loading branch information
TomNicholas authored and shoyer committed Jun 26, 2019
1 parent 8c73852 commit 17d18ce
Show file tree
Hide file tree
Showing 10 changed files with 162 additions and 158 deletions.
4 changes: 2 additions & 2 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ Top-level functions
concat
merge
auto_combine
combine_auto
combine_manual
combine_by_coords
combine_nested
where
set_options
full_like
Expand Down
28 changes: 14 additions & 14 deletions doc/combining.rst
Original file line number Diff line number Diff line change
Expand Up @@ -247,23 +247,23 @@ Combining along multiple dimensions
.. note::

There are currently three combining functions with similar names:
:py:func:`~xarray.auto_combine`, :py:func:`~xarray.combine_auto`, and
:py:func:`~xarray.combine_manual`. This is because
:py:func:`~xarray.auto_combine`, :py:func:`~xarray.combine_by_coords`, and
:py:func:`~xarray.combine_nested`. This is because
``auto_combine`` is in the process of being deprecated in favour of the other
two functions, which are more general. If your code currently relies on
``auto_combine``, then you will be able to get similar functionality by using
``combine_manual``.
``combine_nested``.

For combining many objects along multiple dimensions xarray provides
:py:func:`~xarray.combine_manual`` and :py:func:`~xarray.combine_auto`. These
:py:func:`~xarray.combine_nested`` and :py:func:`~xarray.combine_by_coords`. These
functions use a combination of ``concat`` and ``merge`` across different
variables to combine many objects into one.

:py:func:`~xarray.combine_manual`` requires specifying the order in which the
objects should be combined, while :py:func:`~xarray.combine_auto` attempts to
:py:func:`~xarray.combine_nested`` requires specifying the order in which the
objects should be combined, while :py:func:`~xarray.combine_by_coords` attempts to
infer this ordering automatically from the coordinates in the data.

:py:func:`~xarray.combine_manual` is useful when you know the spatial
:py:func:`~xarray.combine_nested` is useful when you know the spatial
relationship between each object in advance. The datasets must be provided in
the form of a nested list, which specifies their relative position and
ordering. A common task is collecting data from a parallelized simulation where
Expand All @@ -276,9 +276,9 @@ datasets into a doubly-nested list, e.g:
arr = xr.DataArray(name='temperature', data=np.random.randint(5, size=(2, 2)), dims=['x', 'y'])
arr
ds_grid = [[arr, arr], [arr, arr]]
xr.combine_manual(ds_grid, concat_dim=['x', 'y'])
xr.combine_nested(ds_grid, concat_dim=['x', 'y'])
:py:func:`~xarray.combine_manual` can also be used to explicitly merge datasets
:py:func:`~xarray.combine_nested` can also be used to explicitly merge datasets
with different variables. For example if we have 4 datasets, which are divided
along two times, and contain two different variables, we can pass ``None``
to ``'concat_dim'`` to specify the dimension of the nested list over which
Expand All @@ -289,25 +289,25 @@ we wish to use ``merge`` instead of ``concat``:
temp = xr.DataArray(name='temperature', data=np.random.randn(2), dims=['t'])
precip = xr.DataArray(name='precipitation', data=np.random.randn(2), dims=['t'])
ds_grid = [[temp, precip], [temp, precip]]
xr.combine_manual(ds_grid, concat_dim=['t', None])
xr.combine_nested(ds_grid, concat_dim=['t', None])
:py:func:`~xarray.combine_auto` is for combining objects which have dimension
:py:func:`~xarray.combine_by_coords` is for combining objects which have dimension
coordinates which specify their relationship to and order relative to one
another, for example a linearly-increasing 'time' dimension coordinate.

Here we combine two datasets using their common dimension coordinates. Notice
they are concatenated in order based on the values in their dimension
coordinates, not on their position in the list passed to ``combine_auto``.
coordinates, not on their position in the list passed to ``combine_by_coords``.

.. ipython:: python
:okwarning:
x1 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [0, 1, 2])])
x2 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [3, 4, 5])])
xr.combine_auto([x2, x1])
xr.combine_by_coords([x2, x1])
These functions can be used by :py:func:`~xarray.open_mfdataset` to open many
files as one dataset. The particular function used is specified by setting the
argument ``'combine'`` to ``'auto'`` or ``'manual'``. This is useful for
argument ``'combine'`` to ``'by_coords'`` or ``'nested'``. This is useful for
situations where your data is split across many files in multiple locations,
which have some known relationship between one another.
2 changes: 1 addition & 1 deletion doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ object:
Aggregation results are assigned the coordinate at the end of each window by
default, but can be centered by passing ``center=True`` when constructing the
``Rolling`` object:
``Rolling`` object:

.. ipython:: python
Expand Down
4 changes: 2 additions & 2 deletions doc/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -767,8 +767,8 @@ Combining multiple files
NetCDF files are often encountered in collections, e.g., with different files
corresponding to different model runs. xarray can straightforwardly combine such
files into a single Dataset by making use of :py:func:`~xarray.concat`,
:py:func:`~xarray.merge`, :py:func:`~xarray.combine_manual` and
:py:func:`~xarray.combine_auto`. For details on the difference between these
:py:func:`~xarray.merge`, :py:func:`~xarray.combine_nested` and
:py:func:`~xarray.combine_by_coords`. For details on the difference between these
functions see :ref:`combining data`.

.. note::
Expand Down
10 changes: 5 additions & 5 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,17 +60,17 @@ Enhancements
Datasets can now be combined along any number of dimensions,
instead of just a one-dimensional list of datasets.

The new ``combine_manual`` will accept the datasets as a a nested
The new ``combine_nested`` will accept the datasets as a a nested
list-of-lists, and combine by applying a series of concat and merge
operations. The new ``combine_auto`` will instead use the dimension
operations. The new ``combine_by_coords`` will instead use the dimension
coordinates of the datasets to order them.

``open_mfdataset`` can use either ``combine_manual`` or ``combine_auto`` to
``open_mfdataset`` can use either ``combine_nested`` or ``combine_by_coords`` to
combine datasets along multiple dimensions, by specifying the argument
`combine='manual'` or `combine='auto'`.
`combine='nested'` or `combine='by_coords'`.

This means that the original function ``auto_combine`` is being deprecated.
To avoid FutureWarnings switch to using `combine_manual` or `combine_auto`,
To avoid FutureWarnings switch to using `combine_nested` or `combine_by_coords`,
(or set the `combine` argument in `open_mfdataset`). (:issue:`2159`)
By `Tom Nicholas <http://github.com/TomNicholas>`_.
- Better warning message when supplying invalid objects to ``xr.merge``
Expand Down
2 changes: 1 addition & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from .core.alignment import align, broadcast, broadcast_arrays
from .core.common import full_like, zeros_like, ones_like
from .core.concat import concat
from .core.combine import combine_auto, combine_manual, auto_combine
from .core.combine import combine_by_coords, combine_nested, auto_combine
from .core.computation import apply_ufunc, dot, where
from .core.extensions import (register_dataarray_accessor,
register_dataset_accessor)
Expand Down
51 changes: 26 additions & 25 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from .. import Dataset, DataArray, backends, conventions
from ..core import indexing
from .. import auto_combine
from ..core.combine import (combine_auto, _manual_combine,
from ..core.combine import (combine_by_coords, _nested_combine,
_infer_concat_order_from_positions)
from ..core.utils import close_on_error, is_grib_path, is_remote_uri
from .common import ArrayWriter
Expand Down Expand Up @@ -599,15 +599,16 @@ def open_mfdataset(paths, chunks=None, concat_dim='_not_supplied',
**kwargs):
"""Open multiple files as a single dataset.
If combine='auto' then the function `combine_auto` is used to combine the
datasets into one before returning the result, and if combine='manual' then
`combine_manual` is used. The filepaths must be structured according to
which combining function is used, the details of which are given in the
documentation for ``combine_auto`` and ``combine_manual``.
By default the old (now deprecated) ``auto_combine`` will be used, please
specify either ``combine='auto'`` or ``combine='manual'`` in future.
Requires dask to be installed. See documentation for details on dask [1].
Attributes from the first dataset file are used for the combined dataset.
If combine='by_coords' then the function ``combine_by_coords`` is used to
combine the datasets into one before returning the result, and if
combine='nested' then ``combine_nested`` is used. The filepaths must be
structured according to which combining function is used, the details of
which are given in the documentation for ``combine_by_coords`` and
``combine_nested``. By default the old (now deprecated) ``auto_combine``
will be used, please specify either ``combine='by_coords'`` or
``combine='nested'`` in future. Requires dask to be installed. See
documentation for details on dask [1]. Attributes from the first dataset
file are used for the combined dataset.
Parameters
----------
Expand All @@ -631,11 +632,11 @@ def open_mfdataset(paths, chunks=None, concat_dim='_not_supplied',
if you want to stack a collection of 2D arrays along a third dimension.
Set ``concat_dim=[..., None, ...]`` explicitly to
disable concatenation along a particular dimension.
combine : {'auto', 'manual'}, optional
Whether ``xarray.auto_combine`` or ``xarray.manual_combine`` is used to
combine all the data. If this argument is not provided,
combine : {'by_coords', 'nested'}, optional
Whether ``xarray.combine_by_coords`` or ``xarray.combine_nested`` is
used to combine all the data. If this argument is not provided,
`xarray.auto_combine` is used, but in the future this behavior will
switch to use `xarray.combine_auto`.
switch to use `xarray.combine_by_coords` by default.
compat : {'identical', 'equals', 'broadcast_equals',
'no_conflicts'}, optional
String indicating how to compare variables of the same name for
Expand Down Expand Up @@ -706,8 +707,8 @@ def open_mfdataset(paths, chunks=None, concat_dim='_not_supplied',
See Also
--------
combine_auto
combine_manual
combine_by_coords
combine_nested
auto_combine
open_dataset
Expand All @@ -730,13 +731,13 @@ def open_mfdataset(paths, chunks=None, concat_dim='_not_supplied',
if not paths:
raise IOError('no files to open')

# If combine='auto' then this is unnecessary, but quick.
# If combine='manual' then this creates a flat list which is easier to
# If combine='by_coords' then this is unnecessary, but quick.
# If combine='nested' then this creates a flat list which is easier to
# iterate over, while saving the originally-supplied structure as "ids"
if combine == 'manual':
if combine == 'nested':
if str(concat_dim) == '_not_supplied':
raise ValueError("Must supply concat_dim when using "
"combine='manual'")
"combine='nested'")
else:
if isinstance(concat_dim, (str, DataArray)) or concat_dim is None:
concat_dim = [concat_dim]
Expand Down Expand Up @@ -776,17 +777,17 @@ def open_mfdataset(paths, chunks=None, concat_dim='_not_supplied',
combined = auto_combine(datasets, concat_dim=concat_dim,
compat=compat, data_vars=data_vars,
coords=coords)
elif combine == 'manual':
elif combine == 'nested':
# Combined nested list by successive concat and merge operations
# along each dimension, using structure given by "ids"
combined = _manual_combine(datasets, concat_dims=concat_dim,
combined = _nested_combine(datasets, concat_dims=concat_dim,
compat=compat, data_vars=data_vars,
coords=coords, ids=ids)
elif combine == 'auto':
elif combine == 'by_coords':
# Redo ordering from coordinates, ignoring how they were ordered
# previously
combined = combine_auto(datasets, compat=compat,
data_vars=data_vars, coords=coords)
combined = combine_by_coords(datasets, compat=compat,
data_vars=data_vars, coords=coords)
else:
raise ValueError("{} is an invalid option for the keyword argument"
" ``combine``".format(combine))
Expand Down
51 changes: 26 additions & 25 deletions xarray/core/combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,8 +216,8 @@ def _combine_1d(datasets, concat_dim, compat='no_conflicts', data_vars='all',
except ValueError as err:
if "encountered unexpected variable" in str(err):
raise ValueError("These objects cannot be combined using only "
"xarray.combine_manual, instead either use "
"xarray.combine_auto, or do it manually "
"xarray.combine_nested, instead either use "
"xarray.combine_by_coords, or do it manually "
"with xarray.concat, xarray.merge and "
"xarray.align")
else:
Expand All @@ -233,7 +233,7 @@ def _new_tile_id(single_id_ds_pair):
return tile_id[1:]


def _manual_combine(datasets, concat_dims, compat, data_vars, coords, ids,
def _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids,
fill_value=dtypes.NA):

if len(datasets) == 0:
Expand All @@ -259,7 +259,7 @@ def _manual_combine(datasets, concat_dims, compat, data_vars, coords, ids,
return combined


def combine_manual(datasets, concat_dim, compat='no_conflicts',
def combine_nested(datasets, concat_dim, compat='no_conflicts',
data_vars='all', coords='different', fill_value=dtypes.NA):
"""
Explicitly combine an N-dimensional grid of datasets into one by using a
Expand Down Expand Up @@ -335,7 +335,7 @@ def combine_manual(datasets, concat_dim, compat='no_conflicts',
precipitation (x, y) float64 5.904 2.453 3.404 ...
>>> ds_grid = [[x1y1, x1y2], [x2y1, x2y2]]
>>> combined = xr.combine_manual(ds_grid, concat_dim=['x', 'y'])
>>> combined = xr.combine_nested(ds_grid, concat_dim=['x', 'y'])
<xarray.Dataset>
Dimensions: (x: 4, y: 4)
Dimensions without coordinates: x, y
Expand Down Expand Up @@ -364,7 +364,7 @@ def combine_manual(datasets, concat_dim, compat='no_conflicts',
precipitation (t) float64 5.904 2.453 3.404 ...
>>> ds_grid = [[t1temp, t1precip], [t2temp, t2precip]]
>>> combined = xr.combine_manual(ds_grid, concat_dim=['t', None])
>>> combined = xr.combine_nested(ds_grid, concat_dim=['t', None])
<xarray.Dataset>
Dimensions: (t: 10)
Dimensions without coordinates: t
Expand All @@ -382,7 +382,7 @@ def combine_manual(datasets, concat_dim, compat='no_conflicts',
concat_dim = [concat_dim]

# The IDs argument tells _manual_combine that datasets aren't yet sorted
return _manual_combine(datasets, concat_dims=concat_dim, compat=compat,
return _nested_combine(datasets, concat_dims=concat_dim, compat=compat,
data_vars=data_vars, coords=coords, ids=False,
fill_value=fill_value)

Expand All @@ -391,8 +391,8 @@ def vars_as_keys(ds):
return tuple(sorted(ds))


def combine_auto(datasets, compat='no_conflicts', data_vars='all',
coords='different', fill_value=dtypes.NA):
def combine_by_coords(datasets, compat='no_conflicts', data_vars='all',
coords='different', fill_value=dtypes.NA):
"""
Attempt to auto-magically combine the given datasets into one by using
dimension coordinates.
Expand Down Expand Up @@ -449,14 +449,14 @@ def combine_auto(datasets, compat='no_conflicts', data_vars='all',
--------
concat
merge
combine_manual
combine_nested
Examples
--------
Combining two datasets using their common dimension coordinates. Notice
they are concatenated based on the values in their dimension coordinates,
not on their position in the list passed to `combine_auto`.
not on their position in the list passed to `combine_by_coords`.
>>> x1
<xarray.Dataset>
Expand All @@ -474,7 +474,7 @@ def combine_auto(datasets, compat='no_conflicts', data_vars='all',
Data variables:
temperature (x) float64 6.97 8.13 7.42 ...
>>> combined = xr.combine_auto([x2, x1])
>>> combined = xr.combine_by_coords([x2, x1])
<xarray.Dataset>
Dimensions: (x: 6)
Coords:
Expand Down Expand Up @@ -528,8 +528,8 @@ def auto_combine(datasets, concat_dim='_not_supplied', compat='no_conflicts',
"""
Attempt to auto-magically combine the given datasets into one.
This entire function is deprecated in favour of ``combine_manual`` and
``combine_auto``.
This entire function is deprecated in favour of ``combine_nested`` and
``combine_by_coords``.
This method attempts to combine a list of datasets into a single entity by
inspecting metadata and using a combination of concat and merge.
Expand Down Expand Up @@ -593,34 +593,35 @@ def auto_combine(datasets, concat_dim='_not_supplied', compat='no_conflicts',
message = dedent("""\
Also `open_mfdataset` will no longer accept a `concat_dim` argument.
To get equivalent behaviour from now on please use the new
`combine_manual` function instead (or the `combine='manual'` option to
`combine_nested` function instead (or the `combine='nested'` option to
`open_mfdataset`).""")

if _dimension_coords_exist(datasets):
message += dedent("""\
The datasets supplied have global dimension coordinates. You may want
to use the new `combine_auto` function (or the `combine='auto'` option
to `open_mfdataset` to order the datasets before concatenation.
Alternatively, to continue concatenating based on the order the
datasets are supplied in in future, please use the new `combine_manual`
function (or the `combine='manual'` option to open_mfdataset).""")
to use the new `combine_by_coords` function (or the
`combine='by_coords'` option to `open_mfdataset` to order the datasets
before concatenation. Alternatively, to continue concatenating based
on the order the datasets are supplied in in future, please use the
new `combine_nested` function (or the `combine='nested'` option to
open_mfdataset).""")
else:
message += dedent("""\
The datasets supplied do not have global dimension coordinates. In
future, to continue concatenating without supplying dimension
coordinates, please use the new `combine_manual` function (or the
`combine='manual'` option to open_mfdataset.""")
coordinates, please use the new `combine_nested` function (or the
`combine='nested'` option to open_mfdataset.""")

if _requires_concat_and_merge(datasets):
manual_dims = [concat_dim].append(None)
message += dedent("""\
The datasets supplied require both concatenation and merging. From
xarray version 0.14 this will operation will require either using the
new `combine_manual` function (or the `combine='manual'` option to
new `combine_nested` function (or the `combine='nested'` option to
open_mfdataset), with a nested list structure such that you can combine
along the dimensions {}. Alternatively if your datasets have global
dimension coordinates then you can use the new `combine_auto` function.
""".format(manual_dims))
dimension coordinates then you can use the new `combine_by_coords`
function.""".format(manual_dims))

warnings.warn(message, FutureWarning, stacklevel=2)

Expand Down
Loading

0 comments on commit 17d18ce

Please sign in to comment.