Skip to content

Commit

Permalink
Raise warnings as errors in the test suite (#12468)
Browse files Browse the repository at this point in the history
This PR fixes a final few warnings introduced in PRs merged after #12406 or with changes that weren't merged into #12406. More importantly, this PR updates the testing suite to always raise uncaught warnings as errors. This is a substantial change to the testing suite that will allow us to proactively react to deprecations in our dependencies, identify areas where pandas/pyarrow/some other dependency handles input data in different ways than us, etc. I've opted for the strictest possible setting, but am open to reviewer requests to be a little more lax with certain classes of warnings.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #12468
  • Loading branch information
vyasr authored Jan 5, 2023
1 parent cf26425 commit 390f817
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 5 deletions.
19 changes: 19 additions & 0 deletions docs/cudf/source/developer_guide/contributing_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,3 +136,22 @@ Standard C++ natively supports various [exception types](https://en.cppreference
which Cython maps to [these Python exception types](https://docs.cython.org/en/latest/src/userguide/wrapping_CPlusPlus.html#exceptions).
In the future, libcudf may employ custom C++ exception types.
If that occurs, this section will be updated to reflect how these may be mapped to desired Python exception types.

### Raising warnings

Where appropriate, cuDF should throw the same warnings as pandas.
For instance, API deprecations in cuDF should track pandas as closely as possible.
However, not all pandas warnings are appropriate for cuDF.
Common exceptional cases include
[implementation-dependent performance warnings](https://pandas.pydata.org/docs/reference/api/pandas.errors.PerformanceWarning.html)
that do not apply to cuDF's internals.
The decision of whether or not to match pandas must be made on a case-by-case
basis and is left to the best judgment of developers and PR reviewers.

### Catching warnings from dependencies

cuDF developers should avoid using deprecated APIs from package dependencies.
However, there may be cases where such uses cannot be avoided, at least in the short term.
If such cases arise, developers should
[catch the warnings](https://docs.python.org/3/library/warnings.html#warnings.catch_warnings)
within cuDF so that they are not visible to the user.
23 changes: 23 additions & 0 deletions docs/cudf/source/developer_guide/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,29 @@ This way, when the bug is fixed, the test suite will fail at this
point (and we will remember to update the test).


### Testing code that throws warnings

Some code may be expected to throw warnings.
A common example is when a cudf API is deprecated for future removal, but many other possibilities exist as well.
The cudf testing suite [surfaces all warnings as errors](https://docs.pytest.org/en/latest/how-to/capture-warnings.html#controlling-warnings).
This includes warnings raised from non-cudf code, such as calls to pandas or pyarrow.
This setting forces developers to proactively deal with deprecations from other libraries,
as well as preventing the internal use of deprecated cudf APIs in other parts of the library.
Just as importantly, it can help catch real errors like integer overflow or division by zero.

When testing code that is expected to throw a warnings, developers should use the
[`pytest.warns`](https://docs.pytest.org/en/latest/how-to/capture-warnings.html#assertwarnings) context to catch the warning.
For parametrized tests that raise warnings under specific conditions, use the `testing._utils.expect_warning_if` decorator instead of `pytest.warns`.

```{warning}
[`warnings.catch_warnings`](https://docs.python.org/3/library/warnings.html#warnings.catch_warnings)
is a tempting alternative to `pytest.warns`.
**Do not use this context manager in tests.**
Unlike `pytest.warns`, which _requires_ that the expected warning be raised,
`warnings.catch_warnings` simply catches warnings that appear without requiring them.
The cudf testing suite should avoid such ambiguities.
```

### Testing utility functions

The `cudf.testing` subpackage provides a handful of utilities for testing the equality of objects.
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/cudf/io/parquet.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2022, NVIDIA CORPORATION.
# Copyright (c) 2019-2023, NVIDIA CORPORATION.

import math
import shutil
Expand Down Expand Up @@ -288,7 +288,7 @@ def _process_dataset(

# Convert filters to ds.Expression
if filters is not None:
filters = pq._filters_to_expression(filters)
filters = pq.filters_to_expression(filters)

# Initialize ds.FilesystemDataset
# TODO: Remove the if len(paths) workaround after following bug is fixed:
Expand Down
4 changes: 4 additions & 0 deletions python/cudf/cudf/tests/pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,7 @@
markers =
spilling: mark benchmark a good candidate to run with `CUDF_SPILL=ON`
xfail_strict = true
filterwarnings =
error
ignore:::.*xdist.*
ignore:::.*pytest.*
9 changes: 6 additions & 3 deletions python/cudf/cudf/tests/test_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,10 @@ def test_sum_of_squares(dtype, nelem):
sr = Series(data)
df = cudf.DataFrame(sr)

got = sr.sum_of_squares()
got_df = df.sum_of_squares()
with pytest.warns(FutureWarning):
got = sr.sum_of_squares()
with pytest.warns(FutureWarning):
got_df = df.sum_of_squares()
expect = (data**2).sum()

if cudf.dtype(dtype).kind in {"u", "i"}:
Expand Down Expand Up @@ -156,7 +158,8 @@ def test_sum_of_squares_decimal(dtype):
data = [str(x) for x in gen_rand("int8", 3) / 10]

expected = pd.Series([Decimal(x) for x in data]).pow(2).sum()
got = cudf.Series(data).astype(dtype).sum_of_squares()
with pytest.warns(FutureWarning):
got = cudf.Series(data).astype(dtype).sum_of_squares()

assert_eq(expected, got)

Expand Down

0 comments on commit 390f817

Please sign in to comment.