Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatree alignment docs #9501

Merged
merged 41 commits into from
Oct 13, 2024
Merged

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Sep 15, 2024

Adds a dedicated section on datatree alignment and coordinate inheritance to the Hierarchical Data page. Intended to complement what's already been added to the Data Structures page by @flamingbear, in a more narrative form. I've also tried to separate out the concept of alignment from coordinate inheritance.

This shouldn't really be merged until a few other things are fixed, particularly #9499.

cc @shoyer @eni-awowale @owenlittlejohns

@TomNicholas TomNicholas added topic-documentation topic-DataTree Related to the implementation of a DataTree class labels Sep 15, 2024
ds_weekly.sizes
ds_monthly.sizes

We cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object, because they do not exactly align:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be more correct to say that we cannot store them unchanged.

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone who doesn't use DataTree yet, this is v nice & clear!

doc/user-guide/hierarchical-data.rst Outdated Show resolved Hide resolved
Comment on lines +657 to +658
Data Alignment
~~~~~~~~~~~~~~
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add comment about open_groups being useful if your data doesn't align

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only note I have on open_groups, it probably deserves more. https://github.com/pydata/xarray/blob/main/doc/getting-started-guide/quick-overview.rst?plain=1#L284

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna prioritize merging this and improving documentation for open_groups later

Copy link
Contributor

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's here looks good. I think my main question is about the things that are left as TODOs. Do they need to be done in this PR, or could/should they be tracked under a new issue, so this PR can be merged?

doc/user-guide/hierarchical-data.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved

.. note::
If you were a previous user of the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package, this is different from what you're used to!
In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible nit (feel free to ignore): Would it be clearer to say the information (or specifically Dataset object) contained on each node was unrelated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a great point, it would definitely be both more clear and more accurate to say that instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in 9b8fc9b

.. note::
This requirement of aligned dimensions is similar to netCDF's concept of `inherited dimensions <https://www.unidata.ucar.edu/software/netcdf/workshops/2007/groups-types/Introduction.html>`_, as in netCDF-4 files dimensions are `visible to all child groups <https://docs.unidata.ucar.edu/netcdf-c/current/groups.html>`_.

This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before getting to this statement, I had added a comment saying we should make it clear that the alignment check ensures alignment with all ancestors, not just the immediate parent. But this covers it nicely!

Coordinate Inheritance
~~~~~~~~~~~~~~~~~~~~~~

Notice that in the trees we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LINK OR DISPLAY AGAIN

I'm tempted to say display it again after this paragraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in d2918bb

Comment on lines 802 to 808
We can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes.

EXAMPLE OF THIS? WOULD IT MAKE MORE SENSE TO USE DIFFERENT DATA TO DEMONSTRATE THIS?

EXAMPLE OF INHERITING FROM A GRANDPARENT?

EXPLAIN DEDUPLICATION?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to include these points in this PR, or merge what is here (maybe with this commented out) and then add more content later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to do it in this PR, but given that everyone seems to be happy with what's here already, and this is a natural break point, perhaps I will just merge this for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-up issue could be to "document the subtleties of coordinate inheritance"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I had hoped to add these bits before you reviewed it @owenlittlejohns )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that content for use in a future PR in 6cab6f8

@TomNicholas TomNicholas added the plan to merge Final call for comments label Oct 13, 2024
@TomNicholas TomNicholas merged commit 9e3d3bd into pydata:main Oct 13, 2024
29 checks passed
@TomNicholas TomNicholas deleted the datatree_alignment_docs branch October 13, 2024 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-DataTree Related to the implementation of a DataTree class topic-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants