Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import datatree in xarray? #7418

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7c6fa70
list datatree in public API
TomNicholas Jan 4, 2023
5ef43be
attempt to import datatree API on xarray import
TomNicholas Jan 4, 2023
d184764
incorporate datatree links into io docs on groups
TomNicholas Jan 4, 2023
d986df3
Merge branch 'main' into import_datatree
TomNicholas Jan 4, 2023
d2e8ec3
add Dataset.to_datatree() method
TomNicholas Jan 12, 2023
08ff5c4
Merge branch 'import_datatree' of https://github.com/TomNicholas/xarr…
TomNicholas Jan 12, 2023
1401ca5
Merge branch 'main' into import_datatree
TomNicholas Jan 25, 2023
b153152
Merge branch 'main' into import_datatree
TomNicholas Jan 27, 2023
c5b8d10
add test that DataTree class can be imported
TomNicholas Jan 31, 2023
62b5e27
add to every CI environment that also has flox
TomNicholas Jan 31, 2023
ffa53c4
also check we can import accessor
TomNicholas Feb 1, 2023
a8f752d
whatsnew
TomNicholas Feb 1, 2023
eed3a71
Merge branch 'import_datatree' of https://github.com/TomNicholas/xarr…
TomNicholas Feb 1, 2023
3d3c29f
Update to_node docstring
TomNicholas Feb 1, 2023
74fea3a
Merge branch 'main' into import_datatree
TomNicholas Feb 1, 2023
95d76e6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 1, 2023
caafe90
test .to_datatree method
TomNicholas Feb 1, 2023
462e0b3
Merge branch 'import_datatree' of https://github.com/TomNicholas/xarr…
TomNicholas Feb 1, 2023
91c6ee1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 1, 2023
bc6a538
fix datatree import
TomNicholas Feb 1, 2023
3baf79e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 1, 2023
667d5cd
protect my import from the exacting ruff linter
TomNicholas Feb 1, 2023
dfe763b
Merge branch 'import_datatree' of https://github.com/TomNicholas/xarr…
TomNicholas Feb 1, 2023
d231055
try installing datatree from main
TomNicholas Feb 1, 2023
ae07dfd
Update xarray/__init__.py
TomNicholas Feb 1, 2023
6343104
also import accessor and open_datatree in top-level init
TomNicholas Feb 1, 2023
395a3ae
importorskip whole test file
TomNicholas Feb 1, 2023
7cf1d55
correct package name in wheels
TomNicholas Feb 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1133,6 +1133,20 @@ used filetypes in the xarray universe.
backends.StoreBackendEntrypoint
backends.ZarrBackendEntrypoint

DataTree
========

Experimental API for handling nested groups of data.
Requires the `xarray-datatree package <https://github.com/xarray-contrib/datatree>`_ to be installed.
See the `datatree documentation <https://xarray-datatree.readthedocs.io/en/latest/>`_ for details.

.. autosummary::
:toctree: generated/

DataTree
open_datatree
register_datatree_accessor

Deprecated / Pending Deprecation
================================

Expand Down
48 changes: 45 additions & 3 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,9 @@ to the original netCDF file, regardless if they exist in the original dataset.
Groups
~~~~~~

Single groups as datasets
.........................

NetCDF groups are not supported as part of the :py:class:`Dataset` data model.
Instead, groups can be loaded individually as Dataset objects.
To do so, pass a ``group`` keyword argument to the
Expand Down Expand Up @@ -228,10 +231,34 @@ Either of these groups can be loaded from the file as an independent :py:class:`
Data variables:
b int64 ...

.. note::
.. _io.netcdf_datatree_groups:

Multiple Groups as a DataTree
.............................

For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package.
If installed, this package's API can be imported directly from xarray, i.e. ``from xarray import DataTree``.

Whilst netCDF groups can only be loaded individually as Dataset objects, a whole file of many nested groups can be loaded
as a single :py:class:`DataTree` object.
To open a whole netCDF file as a tree of groups use the :py:func:`open_datatree()` function.
To save a DataTree object as a netCDF file containing many groups, use the :py:meth:`DataTree.to_netcdf()`` method.

.. _netcdf.group.warning:

.. warning::
``DataTree`` objects do not follow the exact same data model as netCDF files, which means that perfect round-tripping
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that intentionally preformatted, or would it make sense to convert it to a link? (that's really minor, though)

is not always possible.

In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them.
This is in contrast to `xarray's data model <https://docs.xarray.dev/en/stable/user-guide/data-structures.html>`_
(and hence `datatree's data model <https://xarray-datatree.readthedocs.io/en/latest/data-structures.html>`_) in which the dimensions of a (Dataset/Tree)
object are simply the set of dimensions present across all variables in that dataset.

For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package.
This means that if a netCDF file contains dimensions but no variables which possess those dimensions,
these dimensions will not be present when that file is opened as a DataTree object.
Saving this DataTree object to file will therefore not preserve these "unused" dimensions.


.. _io.encoding:
Expand Down Expand Up @@ -633,6 +660,21 @@ To read back a zarr dataset that has been created this way, we use the
ds_zarr = xr.open_zarr("path/to/directory.zarr")
ds_zarr

Groups
~~~~~~

Like for netCDF, zarr groups can either be opened as individual :py:class:`Dataset` objects using the ``group`` keyword argument to :py:func:`open_dataset`,
or alternatively nested groups in zarr stores can be represented by loading the store as a :py:class:`DataTree` object.
(The latter option requires that you have the `xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package installed.)

To open a whole zarr store as a tree of groups use the :py:func:`open_datatree()` function.
To save a DataTree object as a zarr store containing many groups, use the :py:meth:`DataTree.to_zarr()` method.

.. note::
Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files<netcdf.group.warning>`),
as zarr does not support "unused" dimensions.


Cloud Storage Buckets
~~~~~~~~~~~~~~~~~~~~~

Expand Down
6 changes: 6 additions & 0 deletions xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@
# Disable minimum version checks on downstream libraries.
__version__ = "999"

try:
from datatree import DataTree, open_datatree, register_datatree_accessor
except ImportError:
...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is almost certainly a better way to make from xarray import DataTree work than this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as a temporary solution.

TomNicholas marked this conversation as resolved.
Show resolved Hide resolved


# A hardcoded __all__ variable is necessary to appease
# `mypy --strict` running in projects that import xarray.
__all__ = (
Expand Down