Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-index levels as coordinates #947

Merged
merged 26 commits into from
Sep 14, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f31a278
make multi-index levels visible as coordinates
Aug 5, 2016
5e8a677
make levels also visible for Dataset
Aug 5, 2016
19ec381
fix unnamed levels
Aug 5, 2016
1566938
allow providing multi-index levels in .sel
Aug 5, 2016
9f4e4e3
refactored _get_valid_indexers to get_dim_indexers
Aug 5, 2016
2679318
fix broken tests
Aug 6, 2016
723c99a
refactored accessibility and repr of index levels
Aug 11, 2016
6afcb4a
do not allow providing both level and dim indexers in .sel
Aug 11, 2016
76c937e
cosmetic changes
Aug 11, 2016
5009ba8
change signature of Coordinate.__init__
Aug 24, 2016
4c78ea9
check for uniqueness of multi-index level names
Aug 30, 2016
d28e829
no need to check for uniqueness of level names in _level_coords
Aug 30, 2016
810b4f9
rewritten checking uniqueness of multi-index level names
Aug 31, 2016
7738059
fix adding coords/vars with the same name than a multi-index level
Sep 1, 2016
62b46f2
check for level/var name conflicts in one place
Sep 1, 2016
936ec55
cosmetic changes
Sep 2, 2016
1d6a96f
fix Coordinate -> IndexVariable
Sep 2, 2016
ec67bbd
fix col width when formatting multi-index levels
Sep 2, 2016
f80d7a8
add tests for IndexVariable new methods and indexing
Sep 2, 2016
861c78b
fix bug in assert_unique_multiindex_level_names
Sep 2, 2016
37a0796
add tests for Dataset
Sep 2, 2016
fdbf4aa
fix appveyor tests
Sep 2, 2016
d237022
add tests for DataArray
Sep 2, 2016
949fb46
add docs
Sep 2, 2016
bdaad9b
review changes
Sep 3, 2016
a447767
remove name argument of IndexVariable
Sep 13, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 35 additions & 4 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,6 @@ If you create a ``DataArray`` by supplying a pandas
df
xr.DataArray(df)

Xarray supports labeling coordinate values with a :py:class:`pandas.MultiIndex`.
While it handles multi-indexes with unnamed levels, it is recommended that you
explicitly set the names of the levels.

DataArray properties
~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -532,6 +528,41 @@ dimension and whose the values are ``Index`` objects:

ds.indexes

MultiIndex coordinates
~~~~~~~~~~~~~~~~~~~~~~

Xarray supports labeling coordinate values with a :py:class:`pandas.MultiIndex`:

.. ipython:: python

midx = pd.MultiIndex.from_arrays([['R', 'R', 'V', 'V'], [.1, .2, .7, .9]],
names=('band', 'wn'))
mda = xr.DataArray(np.random.rand(4), coords={'spec': midx}, dims='spec')
mda

For convenience multi-index levels are directly accessible as "virtual" or
"derived" coordinates (marked by ``-`` when printing a dataset or data array):

.. ipython:: python

mda['band']
mda.wn

Indexing with multi-index levels is also possible using the ``sel`` method
(see :ref:`multi-level indexing`).

Unlike other coordinates, "virtual" level coordinates are not stored in
the ``coords`` attribute of ``DataArray`` and ``Dataset`` objects
(although they are shown when printing the ``coords`` attribute).
Consequently, most of the coordinates related methods don't apply for them.
It also can't be used to replace one particular level.

Because in a ``DataArray`` or ``Dataset`` object each multi-index level is
accessible as a "virtual" coordinate, its name must not conflict with the names
of the other levels, coordinates and data variables of the same object.
Even though Xarray set default names for multi-indexes with unnamed levels,
it is recommended that you explicitly set the names of the levels.

.. [1] Latitude and longitude are 2D arrays because the dataset uses
`projected coordinates`__. ``reference_time`` refers to the reference time
at which the forecast was made, rather than ``time`` which is the valid time
Expand Down
20 changes: 17 additions & 3 deletions doc/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -325,11 +325,25 @@ Additionally, xarray supports dictionaries:
.. ipython:: python

mda.sel(x={'one': 'a', 'two': 0})
mda.loc[{'one': 'a'}, ...]

For convenience, ``sel`` also accepts multi-index levels directly
as keyword arguments:

.. ipython:: python

mda.sel(one='a', two=0)

Note that using ``sel`` it is not possible to mix a dimension
indexer with level indexers for that dimension
(e.g., ``mda.sel(x={'one': 'a'}, two=0)`` will raise a ``ValueError``).

Like pandas, xarray handles partial selection on multi-index (level drop).
As shown in the last example above, it also renames the dimension / coordinate
when the multi-index is reduced to a single index.
As shown below, it also renames the dimension / coordinate when the
multi-index is reduced to a single index.

.. ipython:: python

mda.loc[{'one': 'a'}, ...]

Unlike pandas, xarray does not guess whether you provide index levels or
dimensions when using ``loc`` in some ambiguous cases. For example, for
Expand Down
7 changes: 7 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,13 @@ Deprecations
Enhancements
~~~~~~~~~~~~

- Multi-index levels are now accessible as "virtual" coordinate variables,
e.g., ``ds['time']`` can pull out the ``'time'`` level of a multi-index
(see :ref:`coordinates`). ``sel`` also accepts providing multi-index levels
as keyword arguments, e.g., ``ds.sel(time='2000-01')``
(see :ref:`multi-level indexing`).
By `Benoit Bovy <https://github.com/benbovy>`_.

Bug fixes
~~~~~~~~~

Expand Down
21 changes: 21 additions & 0 deletions xarray/core/coordinates.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,27 @@ def __delitem__(self, key):
del self._data._coords[key]


class DataArrayLevelCoordinates(AbstractCoordinates):
"""Dictionary like container for DataArray MultiIndex level coordinates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to clarify "Used for attribute style lookup. Not returned directly by any public methods."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this sentence on the line below...


Used for attribute style lookup. Not returned directly by any
public methods.
"""
def __init__(self, dataarray):
self._data = dataarray

@property
def _names(self):
return set(self._data._level_coords)

@property
def variables(self):
level_coords = OrderedDict(
(k, self._data[v].variable.get_level_variable(k))
for k, v in self._data._level_coords.items())
return Frozen(level_coords)


class Indexes(Mapping, formatting.ReprMixin):
"""Ordered Mapping[str, pandas.Index] for xarray objects.
"""
Expand Down
27 changes: 23 additions & 4 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@
from . import utils
from .alignment import align
from .common import AbstractArray, BaseDataObject, squeeze
from .coordinates import DataArrayCoordinates, Indexes
from .coordinates import (DataArrayCoordinates, DataArrayLevelCoordinates,
Indexes)
from .dataset import Dataset
from .pycompat import iteritems, basestring, OrderedDict, zip
from .variable import (as_variable, Variable, as_compatible_data, IndexVariable,
default_index_coordinate)
default_index_coordinate,
assert_unique_multiindex_level_names)
from .formatting import format_item


Expand Down Expand Up @@ -82,6 +84,8 @@ def _infer_coords_and_dims(shape, coords, dims):
'length %s on the data but length %s on '
'coordinate %r' % (d, sizes[d], s, k))

assert_unique_multiindex_level_names(new_coords)

return new_coords, dims


Expand Down Expand Up @@ -417,14 +421,29 @@ def _item_key_to_dict(self, key):
key = indexing.expanded_indexer(key, self.ndim)
return dict(zip(self.dims, key))

@property
def _level_coords(self):
"""Return a mapping of all MultiIndex levels and their corresponding
coordinate name.
"""
level_coords = OrderedDict()
for cname, var in self._coords.items():
if var.ndim == 1:
level_names = var.to_index_variable().level_names
if level_names is not None:
dim, = var.dims
level_coords.update({lname: dim for lname in level_names})
return level_coords

def __getitem__(self, key):
if isinstance(key, basestring):
from .dataset import _get_virtual_variable

try:
var = self._coords[key]
except KeyError:
_, key, var = _get_virtual_variable(self._coords, key)
_, key, var = _get_virtual_variable(
self._coords, key, self._level_coords)

return self._replace_maybe_drop_dims(var, name=key)
else:
Expand All @@ -444,7 +463,7 @@ def __delitem__(self, key):
@property
def _attr_sources(self):
"""List of places to look-up items for attribute-style access"""
return [self.coords, self.attrs]
return [self.coords, DataArrayLevelCoordinates(self), self.attrs]

def __contains__(self, key):
return key in self._coords
Expand Down
71 changes: 51 additions & 20 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,34 +33,48 @@
'quarter']


def _get_virtual_variable(variables, key):
"""Get a virtual variable (e.g., 'time.year') from a dict of
xarray.Variable objects (if possible)
def _get_virtual_variable(variables, key, level_vars={}):
"""Get a virtual variable (e.g., 'time.year' or a MultiIndex level)
from a dict of xarray.Variable objects (if possible)
"""
if not isinstance(key, basestring):
raise KeyError(key)

split_key = key.split('.', 1)
if len(split_key) != 2:
if len(split_key) == 2:
ref_name, var_name = split_key
elif len(split_key) == 1:
ref_name, var_name = key, None
else:
raise KeyError(key)

ref_name, var_name = split_key
ref_var = variables[ref_name]
if ref_var.ndim == 1:
date = ref_var.to_index()
elif ref_var.ndim == 0:
date = pd.Timestamp(ref_var.values)
if ref_name in level_vars:
dim_var = variables[level_vars[ref_name]]
ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
else:
raise KeyError(key)
ref_var = variables[ref_name]

if var_name == 'season':
# TODO: move 'season' into pandas itself
seasons = np.array(['DJF', 'MAM', 'JJA', 'SON'])
month = date.month
data = seasons[(month // 3) % 4]
if var_name is None:
virtual_var = ref_var
var_name = key
else:
data = getattr(date, var_name)
return ref_name, var_name, Variable(ref_var.dims, data)
if ref_var.ndim == 1:
date = ref_var.to_index()
elif ref_var.ndim == 0:
date = pd.Timestamp(ref_var.values)
else:
raise KeyError(key)

if var_name == 'season':
# TODO: move 'season' into pandas itself
seasons = np.array(['DJF', 'MAM', 'JJA', 'SON'])
month = date.month
data = seasons[(month // 3) % 4]
else:
data = getattr(date, var_name)
virtual_var = Variable(ref_var.dims, data)

return ref_name, var_name, virtual_var


def calculate_dimensions(variables):
Expand Down Expand Up @@ -424,6 +438,21 @@ def _subset_with_all_valid_coords(self, variables, coord_names, attrs):

return self._construct_direct(variables, coord_names, dims, attrs)

@property
def _level_coords(self):
"""Return a mapping of all MultiIndex levels and their corresponding
coordinate name.
"""
level_coords = OrderedDict()
for cname in self._coord_names:
var = self.variables[cname]
if var.ndim == 1:
level_names = var.to_index_variable().level_names
if level_names is not None:
dim, = var.dims
level_coords.update({lname: dim for lname in level_names})
return level_coords
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something here? Wouldn't this also work without the continue statement?

for name in self._coord_names:
    var = self.variables[name]
    if name == var.dims[0]:
        level_coords.update(var.to_coord().get_level_coords())
    return level_coords


def _copy_listed(self, names):
"""Create a new Dataset with the listed variables from this dataset and
the all relevant coordinates. Skips all validation.
Expand All @@ -436,7 +465,7 @@ def _copy_listed(self, names):
variables[name] = self._variables[name]
except KeyError:
ref_name, var_name, var = _get_virtual_variable(
self._variables, name)
self._variables, name, self._level_coords)
variables[var_name] = var
if ref_name in self._coord_names:
coord_names.add(var_name)
Expand All @@ -452,7 +481,8 @@ def _construct_dataarray(self, name):
try:
variable = self._variables[name]
except KeyError:
_, name, variable = _get_virtual_variable(self._variables, name)
_, name, variable = _get_virtual_variable(
self._variables, name, self._level_coords)

coords = OrderedDict()
needed_dims = set(variable.dims)
Expand Down Expand Up @@ -521,6 +551,7 @@ def __setitem__(self, key, value):
if utils.is_dict_like(key):
raise NotImplementedError('cannot yet use a dictionary as a key '
'to set Dataset values')

self.update({key: value})

def __delitem__(self, key):
Expand Down
Loading