Dataset.reduce methods #137

jhamman · 2014-05-20T01:53:30Z

A first attempt at implementing Dataset reduction methods.
#131

shoyer · 2014-05-20T02:09:26Z

xray/dataset.py

+        else:
+            dims = set(dimension)
+
+        variables = {}


Let's make this OrderedDict() instead of an unordered dictionary, just so the result will be less surprising (that is, with variables in the same order as the original).

shoyer · 2014-05-20T04:27:14Z

Very nice start! Please also add a try/except TypeError block to skip variables where the reduction isn't well defined.

shoyer · 2014-05-20T04:29:58Z

test/test_dataset.py

+
+        self.assertDatasetEqual(data.min(dimension=['dim1']),
+                                data.min(dimension='dim1'))
+


Please add a test for dimension=[]:
self.assertDatasetEqual(data.mean(dimension=[]), data)

shoyer · 2014-05-20T17:46:32Z

test/test_dataset.py

+            actual = data.min(dimension=reduct).dimensions
+            self.assertItemsEqual(actual, expected)
+
+        data.__delitem__('time')  # removes unused time dim/var that is dropped


This suggests another reason why it might make more sense to loop over variables (and handle coordinates explicitly) instead of only looping over noncoordinates: it's kind of weird to lose a dimension that wasn't summed over.

shoyer · 2014-05-21T04:13:04Z

xray/dataset.py

+            dims = set(dimension)
+
+        if any([True for dim in dims if dim not in self.coordinates]):
+            bad_dims = [dim for dim in dims if dim not in self.coordinates]


You could simply make this:

bad_dims = [dim for dim in dims if dim not in self.coordinates] if bad_dims: raise ValueError

shoyer · 2014-05-21T04:25:10Z

This getting pretty close but I would like more comprehensive tests to be confident that it is working properly:

Verify an exception is raised if a bad dimension is supplied.
Verify that arrays with non-numeric data types are skipped if doing an inappropriate reduction (e.g., mean of a string variable).

jhamman · 2014-05-21T06:37:11Z

@shoyer - re. the two tests you requested.

See test_reduce_bad_dimension in test_dataset.py.
see test_reduce_non_numeric in test_dataset.py.

shoyer · 2014-05-21T17:13:08Z

xray/dataset.py

+                             '{0}'.format(bad_dims))
+
+        variables = OrderedDict()
+        for name, da in iteritems(self.variables):


To make this slightly less dissonant, let's rename da to something like var which doesn't suggest this is DataArray variable.

shoyer · 2014-05-21T17:17:55Z

xray/dataset.py

+            `f(x, axis=axis, **kwargs)` to return the result of reducing an
+            np.ndarray over an integer valued axis.
+        dimension : str or sequence of str, optional Dimension(s) over which
+            to apply `func`.  If `dimension`(default=None) `func` is applied


"If dimension(default=None)" doesn't quite make sense to me. How about replacing that just with "By default"?

shoyer · 2014-05-21T19:59:28Z

OK, this looks good to me now! Since the history is a little messy (given the number of revisions) could you please squash this into a single commit when I can merge?

jhamman · 2014-05-21T20:13:33Z

Alright, the rebase/squash is done.

Dataset.reduce methods

shoyer · 2014-05-21T20:23:42Z

Thanks!

shoyer reviewed May 20, 2014
View reviewed changes

shoyer added this to the 0.2 milestone May 20, 2014

shoyer added the enhancement label May 20, 2014

shoyer reviewed May 20, 2014
View reviewed changes

jhamman mentioned this pull request May 21, 2014

keep attrs when reducing xray objects #138

Closed

shoyer reviewed May 21, 2014
View reviewed changes

shoyer mentioned this pull request May 21, 2014

Enable keep attrs #139

Closed

shoyer mentioned this pull request May 21, 2014

Dataset.apply method #140

Closed

shoyer reviewed May 21, 2014
View reviewed changes

addition of Dataset.reduce methods

b5d82a0

shoyer added a commit that referenced this pull request May 21, 2014

Merge pull request #137 from jhamman/dataset_reductions

fd5268f

Dataset.reduce methods

shoyer merged commit fd5268f into pydata:master May 21, 2014

jhamman deleted the dataset_reductions branch May 22, 2014 00:35

keewis pushed a commit to keewis/xarray that referenced this pull request Jan 17, 2024

remove anytree license file (pydata#137)

95568c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset.reduce methods #137

Dataset.reduce methods #137

jhamman commented May 20, 2014

shoyer May 20, 2014

shoyer commented May 20, 2014

shoyer May 20, 2014

shoyer May 20, 2014

shoyer May 21, 2014

shoyer commented May 21, 2014

jhamman commented May 21, 2014

shoyer May 21, 2014

shoyer May 21, 2014

shoyer commented May 21, 2014

jhamman commented May 21, 2014

shoyer commented May 21, 2014


		self.assertDatasetEqual(data.min(dimension=['dim1']),
		data.min(dimension='dim1'))

Dataset.reduce methods #137

Dataset.reduce methods #137

Conversation

jhamman commented May 20, 2014

shoyer May 20, 2014

Choose a reason for hiding this comment

shoyer commented May 20, 2014

shoyer May 20, 2014

Choose a reason for hiding this comment

shoyer May 20, 2014

Choose a reason for hiding this comment

shoyer May 21, 2014

Choose a reason for hiding this comment

shoyer commented May 21, 2014

jhamman commented May 21, 2014

shoyer May 21, 2014

Choose a reason for hiding this comment

shoyer May 21, 2014

Choose a reason for hiding this comment

shoyer commented May 21, 2014

jhamman commented May 21, 2014

shoyer commented May 21, 2014