Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for module-level functions with units #3493

Merged

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Nov 7, 2019

This PR adds tests that cover the module level functions of the public API, similar to #3238 and #3447.

  • Tests added
  • Passes black . && mypy . && flake8
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

As a reference for myself, these are the functions listed by the docs:

  • apply_ufunc
  • align
  • broadcast
  • concat
  • merge
  • combine_by_coords
  • combine_nested
  • auto_combine (deprecated)
  • masking / selecting: where
  • replication: full_like, ones_like, zeros_like
  • dot
  • map_blocks

Functions not covered by this PR:

  • auto_combine (deprecated)
  • map_blocks (dask specific, should be the same as apply_ufunc without dask)

@keewis keewis mentioned this pull request Nov 8, 2019
12 tasks
@keewis keewis changed the title WIP: Tests for module-level functions with units Tests for module-level functions with units Nov 10, 2019
@keewis
Copy link
Collaborator Author

keewis commented Nov 10, 2019

this should be ready for review and merge, unless anyone wants tests for auto_combine or map_blocks? I skipped auto_combine because it is listed as deprecated, and map_blocks because until now I avoided testing pint+dask (if I understand it correctly, map_blocks without dask is apply_ufunc?).

I will add tests for Dataset.broadcast_like and DataArray.broadcast_like in a new PR and then go over all tests up until now to fix any style inconsistencies and obvious errors. After that, I guess we have to wait on pint?

Edit: the failing distributed test on MacOSX as well as the timeout for the docs should be unrelated, I think

@max-sixty max-sixty merged commit 4358762 into pydata:master Nov 14, 2019
@max-sixty
Copy link
Collaborator

Thanks a lot as ever @keewis !

@max-sixty
Copy link
Collaborator

While it won't cover all the use cases, check out https://github.com/pydata/xarray/blob/master/xarray/tests/test_variable.py#L1819 when you get a chance; it's possible that inheriting from that test with a pint array might give you some tests for free

@keewis keewis deleted the tests-for-toplevel-functions-with-units branch November 14, 2019 15:18
@keewis
Copy link
Collaborator Author

keewis commented Nov 14, 2019

thanks for reviewing, @max-sixty

I did not plan to add tests for Variable, but that might be an oversight on my part: I have been writing tests for the functions / methods in api.rst where Variable is listed as part of the Advanced API, which means I ignored it. Also, if my understanding of xarray's internals is correct, Variable is used in both DataArray and Dataset operations, so the tests on these should implicitly also test Variable. Considering that implicit is usually not a good idea, should I add tests for Variable?

I don't think inheritance will help much, but that example could definitely be used as a reference / inspiration.

@max-sixty
Copy link
Collaborator

I was predominately suggesting that as a way of saving your time & code on the margin (test_units.py is 4330 LOC!), and it seems like there's some overlap in code that's testing whether functions work at all, before whether the units are working correctly (though agree there's a Variable / DataArray distinction).

As from any time or code savings, I think that it's not strictly necessary to test Variable separately from Dataset & DataArray—it is implicit but it's also the external API—what do others think?

@keewis
Copy link
Collaborator Author

keewis commented Nov 14, 2019

hmm... well, I certainly agree the tests are often quite verbose (maybe too verbose) and sometimes also test functionality of pint (e.g. when incompatible, compatible and identical units are tried). I didn't check, but I don't remember any overlaps with tests from test_dataarray.py or test_dataset.py (if that's what you meant).

To reduce the code, it might be worth to only test compatible units. We could also try to use helper functions for data creation, but while that reduces the code it also makes understanding it a little bit harder.

If Variable is part of the external API it definitely needs tests.

Reusing tests from test_dataarray.py / test_dataset.py / test_variable.py is tempting, but I don't think it is possible unless we rewrite them. Am I missing something?

@max-sixty
Copy link
Collaborator

Overall the tests are great, and the breadth of coverage is impressive. That's more important than their form!

The way I was thinking about leveraging existing tests is that there are
a) some tests that test existing functions at least run on pint-backed arrays and
b) some tests that test whether the units work correctly when used in xarray

Any opportunities to use existing code would be on (a). In the above linked Variable tests, we re-run all the tests for a dask-backed Variable by inheriting from the test class, and xfail those that don't work. (though sounds like you think that wouldn't work in this case?)

We could also try to use helper functions for data creation, but while that reduces the code it also makes understanding it a little bit harder.

Yes, there's some repetition. Did we go back & forth before re putting some of the duplicated setup in fixtures? That could cut down some boilerplate if there's a lot of overlap (though if there's only partial overlap, also increase complication, as you point out)

@keewis
Copy link
Collaborator Author

keewis commented Nov 15, 2019

ahh, now this

cls = staticmethod(lambda *args: Variable(*args).chunk())

makes a lot more sense, I didn't understand the purpose of VariableSubclassobjects before. Yes, that may potentially save us some code. I'll try it out and submit a PR for Variable

We didn't go back & forth re fixtures yet. I'll also investigate the overlap, but I think it's only partial.

dcherian added a commit to dcherian/xarray that referenced this pull request Nov 17, 2019
* upstream/master:
  Added fill_value for unstack (pydata#3541)
  Add DatasetGroupBy.quantile (pydata#3527)
  ensure rename does not change index type (pydata#3532)
  Leave empty slot when not using accessors
  interpolate_na: Add max_gap support. (pydata#3302)
  units & deprecation merge (pydata#3530)
  Fix set_index when an existing dimension becomes a level (pydata#3520)
  add Variable._replace (pydata#3528)
  Tests for module-level functions with units (pydata#3493)
  Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502)
  FUNDING.yml (pydata#3523)
  Allow appending datetime & boolean variables to zarr stores (pydata#3504)
  warn if dim is passed to rolling operations. (pydata#3513)
  Deprecate allow_lazy (pydata#3435)
  Recursive tokenization (pydata#3515)
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 17, 2019
* upstream/master: (22 commits)
  Added fill_value for unstack (pydata#3541)
  Add DatasetGroupBy.quantile (pydata#3527)
  ensure rename does not change index type (pydata#3532)
  Leave empty slot when not using accessors
  interpolate_na: Add max_gap support. (pydata#3302)
  units & deprecation merge (pydata#3530)
  Fix set_index when an existing dimension becomes a level (pydata#3520)
  add Variable._replace (pydata#3528)
  Tests for module-level functions with units (pydata#3493)
  Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502)
  FUNDING.yml (pydata#3523)
  Allow appending datetime & boolean variables to zarr stores (pydata#3504)
  warn if dim is passed to rolling operations. (pydata#3513)
  Deprecate allow_lazy (pydata#3435)
  Recursive tokenization (pydata#3515)
  format indexing.rst code with black (pydata#3511)
  add missing pint integration tests (pydata#3508)
  DOC: update bottleneck repo url (pydata#3507)
  add drop_sel, drop_vars, map to api.rst (pydata#3506)
  remove syntax warning (pydata#3505)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 22, 2019
* master: (24 commits)
  Tweaks to release instructions (pydata#3555)
  Clarify conda environments for new contributors (pydata#3551)
  Revert to dev version
  0.14.1 whatsnew (pydata#3547)
  sparse option to reindex and unstack (pydata#3542)
  Silence sphinx warnings (pydata#3516)
  Numpy 1.18 support (pydata#3537)
  tweak whats-new. (pydata#3540)
  small simplification of rename from pydata#3532 (pydata#3539)
  Added fill_value for unstack (pydata#3541)
  Add DatasetGroupBy.quantile (pydata#3527)
  ensure rename does not change index type (pydata#3532)
  Leave empty slot when not using accessors
  interpolate_na: Add max_gap support. (pydata#3302)
  units & deprecation merge (pydata#3530)
  Fix set_index when an existing dimension becomes a level (pydata#3520)
  add Variable._replace (pydata#3528)
  Tests for module-level functions with units (pydata#3493)
  Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502)
  FUNDING.yml (pydata#3523)
  ...
@keewis keewis mentioned this pull request Dec 6, 2019
18 tasks
@keewis keewis mentioned this pull request Dec 28, 2019
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants