Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options to binary ops kwargs #1065

Merged
merged 10 commits into from
Nov 12, 2016
7 changes: 5 additions & 2 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
assert_unique_multiindex_level_names)
from .formatting import format_item
from .utils import decode_numpy_dict_values, ensure_us_time_resolution
from .options import OPTIONS


def _infer_coords_and_dims(shape, coords, dims):
Expand Down Expand Up @@ -1377,13 +1378,15 @@ def func(self, *args, **kwargs):
return func

@staticmethod
def _binary_op(f, reflexive=False, join='inner', **ignored_kwargs):
def _binary_op(f, reflexive=False, join=None, **ignored_kwargs):
@functools.wraps(f)
def func(self, other):
if isinstance(other, (Dataset, groupby.GroupBy)):
return NotImplemented
if hasattr(other, 'indexes'):
self, other = align(self, other, join=join, copy=False)
# if user does not specify join, default to OPTIONS['join']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment just states what the code does, so it would be better to leave it out

how_to_join = join or OPTIONS['join']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use if join is None instead of implicitly testing join.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would trigger an UnboundLocalError:
if join is None: join = OPTIONS['join']
because join is not defined in the local scope prior to that line.

So the choice is either
1.) if join is None:
how_to_join = OPTIONS['join']
else:
how_to_join = join
or
2.) how_to_join = join or OPTIONS['join']

I opted for (2) for simplicity. Is the style of (1) generally preferred here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do the if/else inline, if you prefer:
how_to_join = OPTIONS['join'] if join is None else join

But yes, I do prefer explicitly checking against None for default values rather than relying on implicit falseness of None.

Copy link
Contributor Author

@chunweiyuan chunweiyuan Oct 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, your suggestion is the same as my first edit to the code, but then my teammates suggested I should use the other options.

Nomenclature is not my forte. I'll make the default
OPTIONS = {'display_width': 80, 'align_type': "inner"}
and change your suggestion to the following

align_type = OPTIONS['align_type'] if join is None else join
self, other = align(self, other, join=align_type, copy=False)

if there's no objection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also align in other operations, e.g., merge, for which the default is to do an outer join. So the option name should be specific to computation/arithmetic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary_op_align_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, I'm using arithmetic_join....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think arithmetic_join or compute_join is probably the way to go. We will probably also use this when/if we ever add ternary operations, too.

self, other = align(self, other, join=how_to_join, copy=False)
other_variable = getattr(other, 'variable', other)
other_coords = getattr(other, 'coords', None)

Expand Down
7 changes: 5 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from .pycompat import (iteritems, basestring, OrderedDict,
dask_array_type)
from .combine import concat
from .options import OPTIONS


# list of attributes of pd.DatetimeIndex that are ndarrays of time info
Expand Down Expand Up @@ -2018,13 +2019,15 @@ def func(self, *args, **kwargs):
return func

@staticmethod
def _binary_op(f, reflexive=False, join='inner', fillna=False):
def _binary_op(f, reflexive=False, join=None, fillna=False):
@functools.wraps(f)
def func(self, other):
if isinstance(other, groupby.GroupBy):
return NotImplemented
if hasattr(other, 'indexes'):
self, other = align(self, other, join=join, copy=False)
# if user does not specify join, default to OPTIONS['join']
how_to_join = join or OPTIONS['join']
self, other = align(self, other, join=how_to_join, copy=False)
g = f if not reflexive else lambda x, y: f(y, x)
ds = self._calculate_binary_op(g, other, fillna=fillna)
return ds
Expand Down
3 changes: 2 additions & 1 deletion xarray/core/options.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
OPTIONS = {'display_width': 80}
OPTIONS = {'display_width': 80,
'join': "inner"}


class set_options(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the docstring for this class?

Expand Down
32 changes: 32 additions & 0 deletions xarray/test/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -2268,3 +2268,35 @@ def test_dot(self):
da.dot(dm.values)
with self.assertRaisesRegexp(ValueError, 'no shared dimensions'):
da.dot(DataArray(1))

def test_binary_op_join_setting(self):
"""
A test method to verify the ability to set binary operation join kwarg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we don't put docstrings on test methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Will remove.

("inner", "outer", "left", "right") via xr.set_options().
"""
# First we set up a data array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sort of comments (repeating the code) just add noise

xdim, ydim, zdim = 'x', 'y', 'z'
xcoords, ycoords, zcoords = ['a', 'b', 'c'], [-2, 0, 2], [0, 1, 2]
total_size = len(xcoords) * len(ycoords) * len(zcoords)
# create a 3-by-3-by-3 data array
arr = xr.DataArray(np.arange(total_size).\
reshape(len(xcoords),len(ycoords),len(zcoords)),
[(xdim, xcoords), (ydim, ycoords),(zdim, zcoords)])
# now create a data array with the last x slice missing
missing_last_x = arr[0:-1,:,:].copy()
# create another data array with the last z slice missing
missing_last_z = arr[:,:,0:-1].copy()
# because the default in OPTIONS is join="inner", we test "outer" first
xr.set_options(join="outer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the context manager for tests instead (e.g, with xr.set_options(join='outer'):. It cleans itself up if an exception is raised in the middle of the test, which makes it more hygenic. Otherwise, if the next assert fails it would also trigger a bunch of other failing tests when the option doesn't get reset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

result = missing_last_x + missing_last_z
self.assertTrue(result.shape == arr.shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking specific properties, construct the full expected result (e.g., by calling align with join='outer' on the inputs) and use assertDataArrayEqual to verify that expect result is the result you get.

self.assertTrue(result[-1,:,:].isnull().all())
self.assertTrue(result[:,:,-1].isnull().all())
# now revert back to join="inner"
xr.set_options(join="inner")
result = missing_last_x + missing_last_z
self.assertTrue(result.shape == \
(len(xcoords)-1, len(ycoords), len(zcoords)-1))
self.assertTrue(result.notnull().all())
self.assertFalse('c' in list(result['x']))
self.assertFalse(2 in list(result['z']))
39 changes: 39 additions & 0 deletions xarray/test/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

import numpy as np
import pandas as pd
import xarray as xr
import pytest

from xarray import (align, broadcast, concat, merge, conventions, backends,
Expand Down Expand Up @@ -2916,6 +2917,44 @@ def test_filter_by_attrs(self):
for var in new_ds.data_vars:
self.assertEqual(new_ds[var].height, '10 m')

def test_binary_op_join_setting(self):
"""
A test method to verify the ability to set binary operation join kwarg
("inner", "outer", "left", "right") via xr.set_options().
"""
# First we set up a data array
xdim, ydim, zdim = 'x', 'y', 'z'
xcoords, ycoords, zcoords = ['a', 'b', 'c'], [-2, 0, 2], [0, 1, 2]
total_size = len(xcoords) * len(ycoords) * len(zcoords)
# create a 3-by-3-by-3 data array
arr = DataArray(np.arange(total_size).\
reshape(len(xcoords),len(ycoords),len(zcoords)),
[(xdim, xcoords), (ydim, ycoords),(zdim, zcoords)])
# now create a data array with the last x slice missing
arr1 = arr[0:-1,:,:].copy()
missing_last_x = arr1.to_dataset(name='foo')
# create another data array with the last z slice missing
arr2 = arr[:,:,0:-1].copy()
missing_last_z = arr2.to_dataset(name='foo') # needs to be name='foo' as well
# because the default in OPTIONS is join="inner", we test "outer" first
xr.set_options(join="outer")
result = missing_last_x + missing_last_z
self.assertTrue(result.foo.shape == arr.shape)
self.assertTrue(result.foo[-1,:,:].isnull().all())
self.assertTrue(result.foo[:,:,-1].isnull().all())
# now revert back to join="inner"
xr.set_options(join="inner")
result = missing_last_x + missing_last_z
self.assertTrue(result.foo.shape == \
(len(xcoords)-1, len(ycoords), len(zcoords)-1))
self.assertTrue(result.foo.notnull().all())
self.assertFalse('c' in list(result.foo['x']))
self.assertFalse(2 in list(result.foo['z']))
# just for kicks, what happens when the dataarrays have different names?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have tests for similar behavior -- see test_dataset_math_auto_align. This might belong there (more tests are great!), but it doesn't belong here.

It does raise the interesting question of what the behavior should be when join='outer' for Dataset objects with different data variables. Should the result now have the union of data variable names? If so, this will take a bit more retooling. (This would be easier after #964 is merged)

misnomer = arr1.to_dataset(name='bar')
result = missing_last_x + misnomer
self.assertTrue(len(result.data_vars)==0) # empty dataset

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much white space here -- by convention, leave only 2 blank lines


### Py.test tests

Expand Down