Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: revise top-level package description #2430

Merged
merged 9 commits into from
Jan 6, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@ xarray: N-D labeled arrays and datasets in Python
=================================================

**xarray** (formerly **xray**) is an open source project and Python package
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer can we drop the reference to xray? The set of people that know the old xray and don't know the new xarray name is probably next to empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, just today in the twitter thread under discussion, someone referenced xray and linked to the v0.2 documentation. 🤦‍♂️

that aims to bring the labeled data power of pandas_ to the physical sciences,
by providing N-dimensional variants of the core pandas data structures.

Our goal is to provide a pandas-like and pandas-compatible toolkit for
analytics on multi-dimensional arrays, rather than the tabular data for which
pandas excels. Our approach adopts the `Common Data Model`_ for self-
that aims to make working with labelled multi-dimensional arrays simple,
max-sixty marked this conversation as resolved.
Show resolved Hide resolved
efficient, and fun!

Labelled multi-dimensional (a.k.a. N-dimensional) arrays are encountered in
rabernat marked this conversation as resolved.
Show resolved Hide resolved
many fields, especially physical sciences, engineering, and finance.
But multi-dimensional data doesn't fit neatly into pandas_, python's most
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to contrast directly with Pandas, I think we need to say what Pandas is first. Maybe also provide an example of what Pandas does (tabular data structures).

Copy link
Member

@TomNicholas TomNicholas Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can't assume that readers know what Pandas is - I certainly didn't. I think that users coming from a more data science background will have used Pandas but those coming from a more low-level array-based numpy/MATLAB/Fortran/C++ point-of-view won't have (e.g. all the physicists I work with).

I also think including an explicit example of a labelled data structure in this explanation would go a long way, the printable representation of an xarray Dataset gives a good idea of how it labels the data it contains.

rabernat marked this conversation as resolved.
Show resolved Hide resolved
popular data analysis package focused on label tabular data.
rabernat marked this conversation as resolved.
Show resolved Hide resolved
Xarray provides a pandas-like and pandas-compatible toolkit for
analytics on multi-dimensional arrays.
Our approach adopts the `Common Data Model`_ for self-
describing scientific data in widespread use in the Earth sciences:
``xarray.Dataset`` is an in-memory representation of a netCDF file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not completely accurate, an xarray.Dataset represents a netCDF-3 or netCDF-4 classic file, but only one of the Groups in a netCDF-4 file with the new netCDF-4 Data Model https://www.unidata.ucar.edu/software/netcdf/workshops/2011/datamodels/Nc4-uml.html (compatible but not identical with the cited Common Data Model). This may sound pedantic at this level, but I found the subtleties of the netCDF 3/4 data models very hard to grasp once I had the mental map between an xarray.Dataset and a netCDF-4 File.

IMHO the best is to keep the reference to the Unidata Common Data Model as xarray uses the extended type system and add a quick reference to the CDM concept of a Group.


Expand Down