-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hypothesis strategies in xarray.testing.strategies #6908
base: main
Are you sure you want to change the base?
Changes from 98 commits
587ebb8
acbfa69
73d763f
db2deff
746cfc8
03cd9de
2fe3583
4db3629
14d11aa
418a359
c8a7d0e
d48aceb
a20e341
3a4816f
d0406a2
65a222d
e1d718a
57d0f5b
82c734c
029f19a
46895fe
50c62e9
e21555a
1688779
0a29d32
3259849
717fabe
d76e5b6
c25940c
cd7b065
742b18c
8e548b1
d1487d4
c8b53f2
8bac610
cf3beb5
d991357
a6405cf
400ae3e
3609a34
63ad529
4ffbcbd
469482d
472de00
ced1a9f
a3c9ad0
b387304
404111d
3764a7b
9723e45
2e44860
1cc073b
603e6bb
63bb362
69ec230
e5c7e23
fd3d357
52f2490
9b96470
41fe0b4
0e53aa1
f659b4b
d1be3ee
e88f5f0
4b88887
2a1dc66
9bddcec
b2887d4
3b8e8ae
0980061
0313b3e
e6ebb1f
4da8772
e6d7a34
5197d1b
15812fd
3dc9c7b
4374681
0f0c4fb
6a30af5
cac46dc
177d908
5424e37
c871273
7730a27
24549bc
3082a09
5df60dc
01078de
53290e2
bd2cb6e
c5e83c2
de26b2f
f81e14f
129e2c3
601d9e2
af24af5
9777c2a
4dcbc60
7841dd5
968ee72
a6fc063
6a4a403
0b13771
b44a4a2
cdcfbf4
0aab116
4994797
525a4b6
b343f4f
e6d8e64
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,278 @@ | ||
.. _testing: | ||
|
||
Testing your code | ||
================= | ||
|
||
.. ipython:: python | ||
:suppress: | ||
|
||
import numpy as np | ||
import pandas as pd | ||
import xarray as xr | ||
|
||
np.random.seed(123456) | ||
|
||
.. _hypothesis: | ||
|
||
Hypothesis testing | ||
------------------ | ||
|
||
.. note:: | ||
|
||
Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look | ||
at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in | ||
`pytest <https://docs.pytest.org/>`_, and have seen the | ||
`hypothesis library documentation <https://hypothesis.readthedocs.io/>`_. | ||
|
||
`The hypothesis library <https://hypothesis.readthedocs.io/>`_ is a powerful tool for property-based testing. | ||
Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many | ||
dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set | ||
of all possible integers via :py:func:`hypothesis.strategies.integers()`. | ||
|
||
Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs | ||
that you did not even think to look for! | ||
|
||
Strategies | ||
~~~~~~~~~~ | ||
|
||
Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray | ||
data structures containing arbitrary data. You can use these to efficiently test downstream code, | ||
quickly ensuring that your code can handle xarray objects of all possible structures and contents. | ||
|
||
These strategies are accessible in the :py:mod:`xarray.testing.strategies` module, which provides | ||
|
||
.. currentmodule:: xarray | ||
|
||
.. autosummary:: | ||
|
||
testing.strategies.numeric_dtypes | ||
testing.strategies.np_arrays | ||
testing.strategies.names | ||
testing.strategies.dimension_names | ||
testing.strategies.dimension_sizes | ||
testing.strategies.attrs | ||
testing.strategies.variables | ||
testing.strategies.coordinate_variables | ||
testing.strategies.dataarrays | ||
testing.strategies.data_variables | ||
testing.strategies.datasets | ||
|
||
These build upon the numpy strategies offered in :py:mod:`hypothesis.extra.numpy`: | ||
|
||
.. ipython:: python | ||
|
||
import hypothesis.extra.numpy as npst | ||
|
||
Generating Examples | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method, | ||
which is a general hypothesis method valid for all strategies. | ||
|
||
.. ipython:: python | ||
|
||
import xarray.testing.strategies as xrst | ||
|
||
xrst.dataarrays().example() | ||
xrst.dataarrays().example() | ||
xrst.dataarrays().example() | ||
|
||
You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide | ||
range of data that the xarray strategies can generate. | ||
|
||
In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the | ||
:py:func:`hypothesis.given` decorator: | ||
|
||
.. ipython:: python | ||
|
||
from hypothesis import given | ||
|
||
.. ipython:: python | ||
|
||
@given(xrst.dataarrays()) | ||
def test_function_that_acts_on_dataarrays(da): | ||
assert func(da) == ... | ||
|
||
|
||
Chaining Strategies | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated | ||
examples. | ||
|
||
.. ipython:: python | ||
|
||
# generate a DataArray with shape (3, 4), but all other details still arbitrary | ||
xrst.dataarrays( | ||
data=xrst.np_arrays(shape=(3, 4), dtype=np.dtype("int32")) | ||
).example() | ||
|
||
This also works with custom strategies, or strategies defined in other packages. | ||
For example you could create a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array. | ||
|
||
.. warning:: | ||
When passing multiple different strategies to the same constructor the drawn examples must be mutually compatible. | ||
|
||
In order to construct a valid xarray object to return, our strategies must check that the | ||
variables / dimensions / coordinates are mutually compatible. If you pass multiple custom strategies to a strategy | ||
constructor which are not compatible in all cases, an error will be raised, *even if they are still compatible in | ||
other cases*. For example | ||
|
||
.. code-block:: | ||
|
||
@st.given(st.data()) | ||
def test_something_else_inefficiently(data): | ||
arrs = npst.arrays(dtype=numeric_dtypes) # generates arrays of any shape | ||
dims = xrst.dimension_names() # generates lists of any number of dimensions | ||
|
||
# Drawing examples from this strategy will raise a hypothesis.errors.InvalidArgument error. | ||
var = data.draw(xrst.variables(data=arrs, dims=dims)) | ||
|
||
assert ... | ||
|
||
Here we have passed custom strategies which won't often be compatible: only rarely will the array's ``ndims`` | ||
correspond to the number of dimensions drawn. We forbid arguments that are only *sometimes* compatible in order to | ||
avoid extremely poor example generation performance (as generating invalid examples and rejecting them is | ||
potentially unboundedly inefficient). | ||
|
||
|
||
Fixing Arguments | ||
~~~~~~~~~~~~~~~~ | ||
|
||
If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples | ||
over all other aspects, then use :py:func:`hypothesis.strategies.just()`. | ||
|
||
.. ipython:: python | ||
|
||
import hypothesis.strategies as st | ||
|
||
# Generates only dataarrays with dimensions ["x", "y"] | ||
xrst.dataarrays(dims=st.just(["x", "y"])).example() | ||
|
||
(This is technically another example of chaining strategies - :py:func:`hypothesis.strategies.just()` is simply a | ||
special strategy that just contains a single example.) | ||
|
||
To fix the length of dimensions you can instead pass `dims` as a mapping of dimension names to lengths | ||
(i.e. following xarray objects' ``.sizes()`` property), e.g. | ||
|
||
.. ipython:: python | ||
|
||
# Generates only dataarrays with dimensions ["x", "y"], of lengths 2 & 3 respectively | ||
xrst.dataarrays(dims=st.just({"x": 2, "y": 3})).example() | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
You can also use this to specify that you want examples which are missing some part of the data structure, for instance | ||
|
||
.. ipython:: python | ||
|
||
# Generates only dataarrays with no coordinates | ||
xrst.datasets(data_vars=st.just({})).example() | ||
|
||
Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the | ||
objects your chained strategy will generate. | ||
|
||
.. ipython:: python | ||
|
||
fixed_x_variable_y_maybe_z = st.fixed_dictionaries( | ||
{"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)} | ||
) | ||
|
||
fixed_x_variable_y_maybe_z.example() | ||
|
||
special_dataarrays = xrst.dataarrays(dims=fixed_x_variable_y_maybe_z) | ||
|
||
special_dataarrays.example() | ||
special_dataarrays.example() | ||
|
||
Here we have used one of hypothesis' built-in strategies :py:func:`hypothesis.strategies.fixed_dictionaries` to create a | ||
strategy which generates mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want). | ||
This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of | ||
length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2. | ||
By feeding this strategy for dictionaries into the `dims` argument of xarray's `dataarrays` strategy, we can generate | ||
arbitrary ``DataArray`` objects whose dimensions will always match these specifications. | ||
|
||
|
||
Creating Duck-type Arrays | ||
~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a | ||
numpy array (so-called "duck array wrapping", see :ref:`internals.duck_arrays`). | ||
|
||
Imagine we want to write a strategy which generates arbitrary `DataArray` objects, each of which wraps a | ||
:py:class:`sparse.COO` array instead of a ``numpy.ndarray``. How could we do that? There are two ways: | ||
|
||
1. Create a xarray object with numpy data and use ``.map()`` to convert the underlying array to a | ||
different type: | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
import sparse | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
def convert_to_sparse(da): | ||
if da.ndim == 0: | ||
return da | ||
else: | ||
da.data = sparse.COO.from_numpy(da.values) | ||
return da | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
sparse_dataarrays = xrst.dataarrays().map(convert_to_sparse) | ||
|
||
sparse_dataarrays.example() | ||
sparse_dataarrays.example() | ||
|
||
2. Pass a strategy which generates the duck-typed arrays directly to the ``data`` argument of the xarray | ||
strategies: | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
@st.composite | ||
def sparse_arrays(draw) -> st.SearchStrategy[sparse._coo.core.COO]: | ||
"""Strategy which generates random sparse.COO arrays""" | ||
shape = draw(npst.array_shapes()) | ||
density = draw(st.integers(min_value=0, max_value=1)) | ||
return sparse.random(shape, density=density) | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
sparse_dataarrays = xrst.dataarrays(data=sparse_arrays()) | ||
|
||
sparse_dataarrays.example() | ||
sparse_dataarrays.example() | ||
|
||
Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you | ||
want to wrap. | ||
|
||
Creating datasets can be a little more involved. Using method (1) is simple: | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
def convert_ds_to_sparse(ds): | ||
return ds.map(convert_to_sparse) | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
sparse_datasets = xrst.datasets().map(convert_ds_to_sparse) | ||
|
||
sparse_datasets.example() | ||
|
||
but building a dataset from scratch (i.e. method (2)) requires building the dataset object in such as way that all of | ||
the data variables have compatible dimensions. You can build up a dictionary of the form ``{var_name: data_variable}`` | ||
yourself, or you can use the ``data_vars`` argument to the ``data_variables`` strategy (TODO): | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
||
sparse_data_vars = xrst.data_variables(data=sparse_arrays()) | ||
sparse_datasets = xrst.datasets(data_vars=sparse_data_vars) | ||
|
||
sparse_datasets.example() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @keewis do you have any thoughts on this section? Given that half the point of this PR is to facilitate testing the duck array wrapping. I'm worried that currently whilst it's easy to generate a The issue is that you can't just pass There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Passing a callable is what I tried with the @st.composite
def pint_arrays(draw, *, shape=None, dtype=None, units=None):
if shape is None:
shape = shapes()
if dtype is None:
dtype = dtypes()
if units is None:
units = units()
arrays = npst.arrays(shape, dtype)
return pint.Quantity(draw(arrays), draw(units)) we would be able to "pin" the strategy = pint_arrays()
dim_sizes = ...
specialized_strategy = strategy.pin(shape=dim_sizes) In other words, instead of calling @Zac-HD, what do you think? Does that make sense to you, or would you recommend to solve this problem differently? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd be strongly in favor of accepting a callable taking Then internally, you pass in the shape, dtype, and maybe elements arguments, and then The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had intended to push Putting the code into the definition of the composite strategy is much better than what I had before (constructing the examples using Do you know if it is possible to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For Pint, I'd write something like the following: def pint_arrays(draw, *, shape, dtype, units=units(), array_strategy_fn=npst.arrays):
return st.builds(pint.Quantity, array_strategy_fn(shape=shape, dtype=dtype), units=units) and then, as you say, use Where possible, I prefer |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
from .testing import ( # noqa: F401 | ||
_assert_dataarray_invariants, | ||
_assert_dataset_invariants, | ||
_assert_indexes_invariants_checks, | ||
_assert_internal_invariants, | ||
_assert_variable_invariants, | ||
_data_allclose_or_equiv, | ||
assert_allclose, | ||
assert_chunks_equal, | ||
assert_duckarray_allclose, | ||
assert_duckarray_equal, | ||
assert_equal, | ||
assert_identical, | ||
) | ||
|
||
__all__ = [ | ||
"assert_allclose", | ||
"assert_chunks_equal", | ||
"assert_duckarray_equal", | ||
"assert_duckarray_allclose", | ||
"assert_equal", | ||
"assert_identical", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all
np_arrays
does is wrap aroundhypothesis.extra.numpy.arrays
, so it's probably better not to expose this as public API?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of that is because xarray only accepts certain dtypes right? I don't have strong feelings about this though, except that users should have all the tools they need to build their own valid xarray strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
xarray
would be fine with almost every dtype (except maybe the structured dtypes), butsparse
in particular is very restricted.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest leaving out
np_arrays
andvalid_dtypes
, in favor of documenting how to usehypothesis.extra.numpy
strategies for Xarray. Users will need to do that anyway for nontrivial tests, and IMO the benefits of a consistent API outweigh the convenience factor for beginning users.