Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.reduce methods #137

Merged
merged 1 commit into from
May 21, 2014
Merged

Dataset.reduce methods #137

merged 1 commit into from
May 21, 2014

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented May 20, 2014

A first attempt at implementing Dataset reduction methods.
#131

else:
dims = set(dimension)

variables = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this OrderedDict() instead of an unordered dictionary, just so the result will be less surprising (that is, with variables in the same order as the original).

@shoyer
Copy link
Member

shoyer commented May 20, 2014

Very nice start! Please also add a try/except TypeError block to skip variables where the reduction isn't well defined.


self.assertDatasetEqual(data.min(dimension=['dim1']),
data.min(dimension='dim1'))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for dimension=[]:
self.assertDatasetEqual(data.mean(dimension=[]), data)

@shoyer shoyer added this to the 0.2 milestone May 20, 2014
actual = data.min(dimension=reduct).dimensions
self.assertItemsEqual(actual, expected)

data.__delitem__('time') # removes unused time dim/var that is dropped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggests another reason why it might make more sense to loop over variables (and handle coordinates explicitly) instead of only looping over noncoordinates: it's kind of weird to lose a dimension that wasn't summed over.

dims = set(dimension)

if any([True for dim in dims if dim not in self.coordinates]):
bad_dims = [dim for dim in dims if dim not in self.coordinates]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simply make this:

bad_dims = [dim for dim in dims if dim not in self.coordinates]
if bad_dims:
    raise ValueError

@shoyer
Copy link
Member

shoyer commented May 21, 2014

This getting pretty close but I would like more comprehensive tests to be confident that it is working properly:

  • Verify an exception is raised if a bad dimension is supplied.
  • Verify that arrays with non-numeric data types are skipped if doing an inappropriate reduction (e.g., mean of a string variable).

@shoyer shoyer mentioned this pull request May 21, 2014
@jhamman
Copy link
Member Author

jhamman commented May 21, 2014

@shoyer - re. the two tests you requested.

  • See test_reduce_bad_dimension in test_dataset.py.
  • see test_reduce_non_numeric in test_dataset.py.

@shoyer shoyer mentioned this pull request May 21, 2014
'{0}'.format(bad_dims))

variables = OrderedDict()
for name, da in iteritems(self.variables):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this slightly less dissonant, let's rename da to something like var which doesn't suggest this is DataArray variable.

`f(x, axis=axis, **kwargs)` to return the result of reducing an
np.ndarray over an integer valued axis.
dimension : str or sequence of str, optional Dimension(s) over which
to apply `func`. If `dimension`(default=None) `func` is applied
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If dimension(default=None)" doesn't quite make sense to me. How about replacing that just with "By default"?

@shoyer
Copy link
Member

shoyer commented May 21, 2014

OK, this looks good to me now! Since the history is a little messy (given the number of revisions) could you please squash this into a single commit when I can merge?

@jhamman
Copy link
Member Author

jhamman commented May 21, 2014

Alright, the rebase/squash is done.

shoyer added a commit that referenced this pull request May 21, 2014
@shoyer shoyer merged commit fd5268f into pydata:master May 21, 2014
@shoyer
Copy link
Member

shoyer commented May 21, 2014

Thanks!

@jhamman jhamman deleted the dataset_reductions branch May 22, 2014 00:35
keewis pushed a commit to keewis/xarray that referenced this pull request Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants