Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize reporting.Quantity in preparation for sparse xarray #317

Merged
merged 22 commits into from
Jun 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
2efd34a
Sort output in 'ixmp report' CLI command
khaeru Apr 25, 2020
263ccbd
Split reporting.quantity and reporting.attrseries
khaeru Apr 25, 2020
4b72489
Add reporting.sparsedataarray
khaeru Apr 25, 2020
e7fe571
Use SparseDataArray in tests
khaeru Apr 25, 2020
0e3a2fe
Parametrize tests for both kinds of reporting.Quantity
khaeru Apr 25, 2020
c8252f8
Move test_report_size to TestQuantity.test_size
khaeru Apr 25, 2020
21acfed
Appease Stickler
khaeru Apr 26, 2020
c3723f4
Improve coverage to 100% in reporting.attrseries and .sparsedataframe
khaeru Apr 26, 2020
21e872f
Catch more exceptions in reporting.utils.parse_units
khaeru May 10, 2020
38a03e7
Bump sparse requirement, adjust test_sda_accessor()
khaeru Jun 2, 2020
38c4271
Lint with flake8
khaeru Jun 11, 2020
a1b8156
Add reporting.testing.random_qty
khaeru Jun 19, 2020
f25835b
Add reporting.quantity.assert_quantity()
khaeru Jun 19, 2020
269e450
Add reporting.computations.add (from message_ix)
khaeru Jun 19, 2020
da0b78b
Expand tests of computations.product
khaeru Jun 19, 2020
8f7105a
Expand arguments accepted by Quantity() constructor
khaeru Jun 19, 2020
3e21ac1
Add tests of reporting.computations.add
khaeru Jun 19, 2020
176c36b
Remove commented diagnostic code in computations.data_for_quantity
khaeru Jun 19, 2020
6b51340
Improve AttrSeries.squeeze()
khaeru Jun 19, 2020
a317fe4
Move reporting assertions to reporting.testing
khaeru Jun 19, 2020
d33e2e9
Update reporting.quantity documentation
khaeru Jun 19, 2020
28d59ae
Add #317 to release notes
khaeru Jun 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions RELEASE_NOTES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Next release
All changes
-----------

- :pull:`317`: Make :class:`reporting.Quantity` classes interchangeable.
- :pull:`330`: Use GitHub Actions for continuous testing and integration.


Expand Down
1 change: 1 addition & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@
'pandas': ('https://pandas.pydata.org/pandas-docs/stable/', None),
'pint': ('https://pint.readthedocs.io/en/stable/', None),
'python': ('https://docs.python.org/3/', None),
'sparse': ('https://sparse.pydata.org/en/stable/', None),
'xarray': ('https://xarray.pydata.org/en/stable/', None),
}

Expand Down
51 changes: 43 additions & 8 deletions doc/source/reporting.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. currentmodule:: ixmp.reporting

Reporting
=========
*********

Top-level methods and classes:

Expand All @@ -11,7 +11,6 @@ Top-level methods and classes:
Reporter
Key
Quantity
as_quantity

Others:

Expand Down Expand Up @@ -184,12 +183,30 @@ Others:
>>> foo('a b c')
foo:a-b-c

.. automodule:: ixmp.reporting
:members: Quantity, as_quantity
.. autodata:: ixmp.reporting.Quantity(data, *args, **kwargs)
:annotation:

The :data:`.Quantity` constructor converts its arguments to an internal, :class:`xarray.DataArray`-like data format:

.. code-block:: python

# Existing data
data = pd.Series(...)

# Convert to a Quantity for use in reporting calculations
qty = Quantity(data, name="Quantity name", units="kg")
rep.add("new_qty", qty)

Common :mod:`ixmp.reporting` usage, e.g. in :mod:`message_ix`, creates large, sparse data frames (billions of possible elements, but <1% populated); :class:`~xarray.DataArray`'s default, 'dense' storage format would be too large for available memory.

- Currently, Quantity is :class:`.AttrSeries`, a wrapped :class:`pandas.Series` that behaves like a :class:`~xarray.DataArray`.
- In the future, :mod:`ixmp.reporting` will use :class:`.SparseDataArray`, and eventually :class:`~xarray.DataArray` backed by sparse data, directly.

The goal is that reporting code, including built-in and user computations, can treat quantity arguments as if they were :class:`~xarray.DataArray`.


Computations
------------
============

.. automodule:: ixmp.reporting.computations
:members:
Expand All @@ -201,6 +218,7 @@ Computations
Calculations:

.. autosummary::
add
aggregate
apply_units
disaggregate_shares
Expand All @@ -221,10 +239,27 @@ Computations
concat


Utilities
---------
Internal format for reporting quantities
========================================

.. autoclass:: ixmp.reporting.quantity.AttrSeries
.. currentmodule:: ixmp.reporting.quantity

.. automodule:: ixmp.reporting.quantity
:members: assert_quantity

.. currentmodule:: ixmp.reporting.attrseries

.. automodule:: ixmp.reporting.attrseries
:members:

.. currentmodule:: ixmp.reporting.sparsedataarray

.. automodule:: ixmp.reporting.sparsedataarray
:members: SparseDataArray, SparseAccessor


Utilities
=========

.. automodule:: ixmp.reporting.utils
:members:
2 changes: 1 addition & 1 deletion ixmp/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def report(context, config, key):
r.configure(config)

# Print the target
print(r.get(key))
print(r.get(key).to_series().sort_index())


@main.command('show-versions')
Expand Down
3 changes: 1 addition & 2 deletions ixmp/reporting/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
from .describe import describe_recursive
from .exceptions import ComputationError
from .key import Key
from .quantity import Quantity, as_quantity
from .quantity import Quantity
from .utils import (
REPLACE_UNITS,
RENAME_DIMS,
Expand All @@ -56,7 +56,6 @@
'Key',
'Quantity',
'Reporter',
'as_quantity',
'configure',
]

Expand Down
197 changes: 197 additions & 0 deletions ixmp/reporting/attrseries.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
import pandas as pd
import pandas.core.indexes.base as ibase
import xarray as xr


class AttrSeries(pd.Series):
""":class:`pandas.Series` subclass imitating :class:`xarray.DataArray`.

The AttrSeries class provides similar methods and behaviour to
:class:`xarray.DataArray`, so that :mod:`ixmp.reporting.computations`
methods can use xarray-like syntax.

Parameters
----------
units : str or pint.Unit, optional
Set the units attribute. The value is converted to :class:`pint.Unit`
and added to `attrs`.
attrs : :class:`~collections.abc.Mapping`, optional
Set the :attr:`~pandas.Series.attrs` of the AttrSeries. This attribute
was added in `pandas 1.0
<https://pandas.pydata.org/docs/whatsnew/v1.0.0.html>`_, but is not
currently supported by the Series constructor.
"""

# See https://pandas.pydata.org/docs/development/extending.html
@property
def _constructor(self):
return AttrSeries

def __init__(self, data=None, *args, name=None, attrs=None, **kwargs):
attrs = attrs or dict()

if hasattr(data, 'attrs'):
# Use attrs from an existing object
new_attrs = data.attrs.copy()

# Overwrite with explicit attrs argument
new_attrs.update(attrs)
attrs = new_attrs

if isinstance(data, (AttrSeries, xr.DataArray)):
# Extract name from existing object or use the argument
name = ibase.maybe_extract_name(name, data, type(self))

# Pre-convert to pd.Series from xr.DataArray to preserve names and
# labels. For AttrSeries, this is a no-op (see below).
data = data.to_series()

# Don't pass attrs to pd.Series constructor; it currently does not
# accept them
super().__init__(data, *args, name=name, **kwargs)

# Update the attrs after initialization
self.attrs.update(attrs)

@classmethod
def from_series(cls, series, sparse=None):
"""Like :meth:`xarray.DataArray.from_series`."""
return cls(series)

def assign_coords(self, **kwargs):
"""Like :meth:`xarray.DataArray.assign_coords`."""
return pd.concat([self], keys=kwargs.values(), names=kwargs.keys())

@property
def coords(self):
"""Like :attr:`xarray.DataArray.coords`. Read-only."""
result = dict()
for name, levels in zip(self.index.names, self.index.levels):
result[name] = xr.Dataset(None, coords={name: levels})[name]
return result

@property
def dims(self):
"""Like :attr:`xarray.DataArray.dims`."""
return tuple(self.index.names)

def drop(self, label):
"""Like :meth:`xarray.DataArray.drop`."""
return self.droplevel(label)

def item(self, *args):
"""Like :meth:`xarray.DataArray.item`."""
if len(args) and args != (None,):
raise NotImplementedError
elif self.size != 1:
raise ValueError
return self.iloc[0]

def rename(self, new_name_or_name_dict):
"""Like :meth:`xarray.DataArray.rename`."""
if isinstance(new_name_or_name_dict, dict):
return self.rename_axis(index=new_name_or_name_dict)
else:
return super().rename(new_name_or_name_dict)

def sel(self, indexers=None, drop=False, **indexers_kwargs):
"""Like :meth:`xarray.DataArray.sel`."""
indexers = indexers or {}
indexers.update(indexers_kwargs)
if len(indexers) == 1:
level, key = list(indexers.items())[0]
if isinstance(key, str) and not drop:
if isinstance(self.index, pd.MultiIndex):
# When using .loc[] to select 1 label on 1 level, pandas
# drops the level. Use .xs() to avoid this behaviour unless
# drop=True
return AttrSeries(self.xs(key, level=level,
drop_level=False))
else:
# No MultiIndex; use .loc with a slice to avoid returning
# scalar
return self.loc[slice(key, key)]

idx = tuple(indexers.get(n, slice(None)) for n in self.index.names)
return AttrSeries(self.loc[idx])

def sum(self, *args, **kwargs):
"""Like :meth:`xarray.DataArray.sum`."""
obj = super(AttrSeries, self)
attrs = None

try:
dim = kwargs.pop('dim')
except KeyError:
dim = list(args)
args = tuple()

if len(dim) == len(self.index.names):
bad_dims = set(dim) - set(self.index.names)
if bad_dims:
raise ValueError(f'{bad_dims} not found in array dimensions '
f'{self.index.names}')
# Simple sum
kwargs = {}
else:
# pivot and sum across columns
obj = self.unstack(dim)
kwargs['axis'] = 1
# Result will be DataFrame; re-attach attrs when converted to
# AttrSeries
attrs = self.attrs

return AttrSeries(obj.sum(*args, **kwargs), attrs=attrs)

def squeeze(self, dim=None, *args, **kwargs):
"""Like :meth:`xarray.DataArray.squeeze`."""
assert kwargs.pop("drop", True)

try:
idx = self.index.remove_unused_levels()
except AttributeError:
return self

to_drop = []
for i, name in enumerate(idx.names):
if dim and name != dim:
continue
elif len(idx.levels[i]) > 1:
if dim is None:
continue
else:
raise ValueError(
"cannot select a dimension to squeeze out which has "
"length greater than one"
)

to_drop.append(name)

if dim and not to_drop:
# Specified dimension does not exist
raise KeyError(dim)

return self.droplevel(to_drop)

def transpose(self, *dims):
"""Like :meth:`xarray.DataArray.transpose`."""
return self.reorder_levels(dims)

def to_dataframe(self):
"""Like :meth:`xarray.DataArray.to_dataframe`."""
return self.to_frame()

def to_series(self):
"""Like :meth:`xarray.DataArray.to_series`."""
return self

def align_levels(self, other):
"""Work around https://github.com/pandas-dev/pandas/issues/25760.

Return a copy of *obj* with common levels in the same order as *ref*.
"""
if not isinstance(self.index, pd.MultiIndex):
return self
common = [n for n in other.index.names if n in self.index.names]
unique = [n for n in self.index.names if n not in common]
return self.reorder_levels(common + unique)
Loading