Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: insert 'match' to bare pytest raises in pandas/tests/internals/ #30998

Merged
merged 17 commits into from
Jan 20, 2020

Conversation

ShaharNaveh
Copy link
Member

@ShaharNaveh ShaharNaveh commented Jan 14, 2020

@ShaharNaveh ShaharNaveh added Clean Testing pandas testing functions or related to the test suite labels Jan 14, 2020
@ShaharNaveh ShaharNaveh added this to the 1.1 milestone Jan 14, 2020
@datapythonista
Copy link
Member

Looks like this is breaking a test:

E               AssertionError: Pattern "cannot perform __pow\\_\\_ with this index type: DatetimeArray|cannot perform __mod\\_\\_ with this index type: DatetimeArray|cannot perform __truediv\\_\\_ with this index type: DatetimeArray|cannot perform __mul\\_\\_ with this index type: DatetimeArray|cannot perform __pow\\_\\_ with this index type: TimedeltaArray|ufunc 'multiply' cannot use operands with types dtype\\('<m8\\[ns\\]'\\) and dtype\\('<m8\\[ns\\]'\\)|cannot add DatetimeArray and Timestamp" not found in "ufunc multiply cannot use operands with types dtype('<m8[ns]') and dtype('<m8[ns]')"

@simonjayhawkins
Copy link
Member

i think something like this would help identify what needs to be done to make the error messages consistent...

diff --git a/pandas/tests/internals/test_internals.py b/pandas/tests/internals/test_internals.py
index 3f23486cd..9efc95fdb 100644
--- a/pandas/tests/internals/test_internals.py
+++ b/pandas/tests/internals/test_internals.py
@@ -818,14 +818,11 @@ class TestBlockManager:
         invalid_values = [1, "True", [1, 2, 3], 5.0]
         bm1 = create_mgr("a,b,c: i8-1; d,e,f: i8-2")
 
-        msg = (
-            r'For argument "inplace" expected type bool, received type int.|'
-            r'For argument "inplace" expected type bool, received type str.|'
-            r'For argument "inplace" expected type bool, received type list.|'
-            r'For argument "inplace" expected type bool, received type float.'
-        )
-
         for value in invalid_values:
+            msg = (
+                'For argument "inplace" expected type bool,'
+                f" received type {type(value).__name__}."
+            )
             with pytest.raises(ValueError, match=msg):
                 bm1.replace_list([1], [2], inplace=value)
 
@@ -1238,16 +1235,22 @@ class TestCanHoldElement:
         }
 
         if (op, dtype) in invalid:
-            msg = (
-                r"cannot perform __pow\_\_ with this index type: DatetimeArray|"
-                r"cannot perform __mod\_\_ with this index type: DatetimeArray|"
-                r"cannot perform __truediv\_\_ with this index type: DatetimeArray|"
-                r"cannot perform __mul\_\_ with this index type: DatetimeArray|"
-                r"cannot perform __pow\_\_ with this index type: TimedeltaArray|"
-                "ufunc 'multiply' cannot use operands with types dtype"
-                r"\('<m8\[ns\]'\) and dtype\('<m8\[ns\]'\)|"
-                "cannot add DatetimeArray and Timestamp"
-            )
+            if op is operator.add:
+                msg = f"cannot add DatetimeArray and Timestamp"
+            elif op is operator.mul:
+                msg = (
+                    re.escape(
+                        "ufunc 'multiply' cannot use operands with types"
+                        " dtype('<m8[ns]') and dtype('<m8[ns]')"
+                    )
+                    + "|cannot perform __mul__ with this index type: DatetimeArray"
+                )
+            else:
+                msg = (
+                    f"cannot perform __{op.__name__}__ with this index type:"
+                    r" (DatetimeArray|TimedeltaArray)"
+                )
+
             with pytest.raises(TypeError, match=msg):
                 op(s, e.value)
         else:

@@ -350,7 +358,10 @@ def test_duplicate_ref_loc_failure(self):
blocks[1].mgr_locs = np.array([0])

# test trying to create block manager with overlapping ref locs
with pytest.raises(AssertionError):

msg = "Gaps in blk ref_locs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should test this, this is an internal error message

(and it's also not a very good error message)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and it's also not a very good error message)

Isn't that exactly the reason why we should test them in the first place? Puts more light on the quality of the error messages that we raise, internal or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that exactly the reason why we should test them in the first place

If we, in one go, improve the error message. Yes, that is fine, but otherwise I don't think it is worth the effort.

Copy link
Member

@gfyoung gfyoung Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is fine, but otherwise I don't think it is worth the effort.

Debatable. If you're someone who has just started working on this codebase, I wouldn't want them to also modify the error message if they don't fully understand what is going. You can easily spend another PR figuring out what the proper error message is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jorisvandenbossche here I don't think worth testing against error messages for AssertionError, since that is purely internal

Copy link
Member

@gfyoung gfyoung Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since that is purely internal

I still don't find that to be valid justification. Internal or not, we use error messages for a reason: because they provide some utility in development. If these error messages (not the Exception class itself) had no use and won't see the light of day simply because they're internal, we should just get rid of them. To borrow from @jorisvandenbossche, they aren't worth the effort then.

Error messages can also serve to differentiate between which error is raised. The managers.py file is littered with AssertionError lines. It would be good to know that the correct one is being raised.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually ok with testing these asserts; they are the current behavior. If things change then this test should fail. That said I wouldn; spend too much effort on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they provide some utility in development.

Yes, they are useful for that. But also because they are only used in development, they can also easily change when refactoring some internals (without that there are user-facing consequences).

But every test (and certainly a test with a match) takes also effort to write, review and maintain (eg update if you update something internally). It's a trade-off. So I am not saying those internal error messages are not useful at all, but I still think they are not worth the effort to properly test with full coverage.

r'For argument "inplace" expected type bool, received type int.|'
r'For argument "inplace" expected type bool, received type str.|'
r'For argument "inplace" expected type bool, received type list.|'
r'For argument "inplace" expected type bool, received type float.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave out the last word to have this less repetitive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a second look, I think I will just have msg to be equal to:

msg = "For argument "inplace" expected type bool, received type (int|str|list|float)."

Will this be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should work. Give it a shot.

pandas/tests/internals/test_internals.py Outdated Show resolved Hide resolved
@pep8speaks
Copy link

pep8speaks commented Jan 15, 2020

Hello @MomIsBestFriend! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-17 16:00:13 UTC

)
+ "(DatetimeArray|TimedeltaArray)"
)
with pytest.raises(TypeError, match=msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in light of the discussion in #30999. Personally, I find that the above code change makes the test much more complicated and more difficult to read and interpret.

There is always a trade-off between testing every exact details vs practicality/readabiity/effort to do this, and personally for a case like the above, this is not worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks hard to read because it's so monolithic.

This test could benefit from a TON of refactoring, including pytest parameterization.

Copy link
Member Author

@ShaharNaveh ShaharNaveh Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could benefit from a TON of refactoring, including pytest parameterization.

@gfyoung I am looking into refactoring this test from the ground up, what would you put as the pytest parameterization(s) ?

I hope I can build the test from the ground up, depends on the given parameterization(s).

Copy link
Member

@gfyoung gfyoung Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into refactoring this test from the ground up, what would you put as the pytest parameterization(s) ?

I might have jumped the gun there in saying we should do pytest parameterization. If there is a way to parameterize so we can remove the logic for the skip tests (and put it as part of the decorator), that would be good.

A similar point could be made about the combinations we expect to fail because I'm uncertain as to feasability. If we could somehow parameterize properly, we could easily break that part out as a separate test.

@ShaharNaveh
Copy link
Member Author

I have reverted the failing test, I will open a separate PR.

@@ -297,7 +297,8 @@ def test_delete(self):
assert (newb.values[1] == 1).all()

newb = self.fblock.copy()
with pytest.raises(Exception):

with pytest.raises(IndexError, match=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the None here for? Can this not be made more specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be more specific as soon as #31130 gets merged.

@@ -350,7 +358,10 @@ def test_duplicate_ref_loc_failure(self):
blocks[1].mgr_locs = np.array([0])

# test trying to create block manager with overlapping ref locs
with pytest.raises(AssertionError):

msg = "Gaps in blk ref_locs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jorisvandenbossche here I don't think worth testing against error messages for AssertionError, since that is purely internal

@jreback jreback merged commit f4de727 into pandas-dev:master Jan 20, 2020
@jreback
Copy link
Contributor

jreback commented Jan 20, 2020

thanks @MomIsBestFriend

I think we are agreed generally that testing for internal Asserts is not worth here. This is adding a single test for this, which is ok. But in general we don't want to add more (and hopefully refactor the internal error messages).

@ShaharNaveh ShaharNaveh deleted the TST-bare-pytest-raises-1 branch January 24, 2020 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants