-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: GH30999 Add match=msg to all pytest.raises in tests/indexes #38697
TST: GH30999 Add match=msg to all pytest.raises in tests/indexes #38697
Conversation
…ctivate half a test in pandas/tests/indexes/interval/test_astype.py
idx = pd.MultiIndex.from_product([[1, pd.Timestamp("2000"), 2], ["a", "b"]]) | ||
other = pd.MultiIndex.from_product([[3, pd.Timestamp("2000"), 4], ["c", "d"]]) | ||
|
||
with pytest.raises(TypeError): | ||
msg = "The 'sort' keyword only takes the values of None or False; True was passed." | ||
with pytest.raises(ValueError, match=msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was marked as failing. But when I switched it to checking that a ValueError
was raised, it passed. I think this works as intended now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine, a lot of things have been cleaned up recently I guess this was missed. cc @jbrockmendel
"Cannot convert interval[float64] to interval[uint64]; subtypes are " | ||
"incompatible" | ||
) | ||
with pytest.raises(TypeError, match=msg): | ||
index.astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test that was marked as failing, with the reason being closed issue #15832 . I put a bit of discussion about this over in the issue.
I removed the xfail
and I also removed the last two pytest.raises
. The result of the coercion is: IntervalIndex([(0, 0], (0, 0], (0, 0], (0, 1], (1, 1] ... (8, 9], (9, 9], (9, 9], (9, 9], (9, 10]], closed='right', dtype='interval[int64]')
which doesn't look particularly useful but isn't exactly wrong either. If I understand the discussion at #15832 correctly this is intended behaviour for it not to raise the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the test for
index = interval_range(0.0, 10.0, freq=0.25)
dtype = IntervalDtype("int64")
is lost here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I removed it deliberately, as well as the equivalent one for uint64
. I tried to explain that in my comment above but I guess I wasn't clear.
This test was marked as failing before. Now it is not. The test, as written before, was saying that
index = interval_range(0.0, 10.0, freq=0.25)
dtype = IntervalDtype("int64")
should raise a ValueError
. It does not raise any error. The question here is: Should casting an interval range with non-integer float boundaries raise an error? This test said it should, but it was marked as failing, because right now it does not raise an error. If I understood the discussion at #15832 correctly, and it is possible that I did not, than I think that it should not raise an error, so this test is not required.
The question is, as a user of pandas, if you had an interval range such as [(0.0, 0.25], (0.25, 0.5], (0.5, 0.75], (0.75, 1.0], (1.0, 1.25], (1.25, 1.5], (1.5, 1.75], (1.75, 2.0], (2.0, 2.25], (2.25, 2.5]]
, and you cast it to int, what would you want or expect? The old (failing) version of the test said you expect a ValueError. I am saying you expect [(0, 0], (0, 0], (0, 0], (0, 1], (1, 1], (1, 1], (1, 1], (1, 2], (2, 2], (2, 2]]
which, as I said, looks not that useful as an IntervalIndex, but is not wrong either.
Would love to hear your thoughts.
I could also add another test for the expected behaviour of casting from float to int intervals if that's what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining, sorry should've read your comment on that linked issue. Your logic makes sense to me, but I'd defer to someone with more experience on how we handle IntervalIndex
.
If being removed from this test, we definitely want to make sure this specific casting behavior is being tested somewhere. To keep this pr simple and easier to merge, I'd suggest reverting modification of this test, then making a followup pr to clean up this test and add tests for any untested casting behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but I am not quite sure how to handle it. I want to leave the code in a good state even if I plan to do another PR soon, because it's possible that one will take a while or not be accepted ever or whatever. The way I see it, these are our options:
- Revert it to exactly what it was before I modified it. That would make the file fail the new linting rule we want to add as part of [Good first issue] TST: Disallow bare pytest.raises #30999 , so don't add the linting rule and leave that issue open. I'm not particularly happy with this option because I have gotten the number of bare pytest.raises down from 250ish to 64 in just a few days and I was hoping to finish in a few more days. And the longer we leave the rule out of the pre-commit script the more likely people are to add new bare pytest.raises and the issue will stay open forever.
- Revert it, and simultaneously add to the linting script a check that the bare pytest.raises is not in a test decorated with pytest xfail. Generally, addressing [Good first issue] TST: Disallow bare pytest.raises #30999 has been a pretty mechanical process, except when it comes to tests marked as failures. Then it takes a lot of digging through the code and historical issues to understand what would be a the correct failure message to add. It's the one thing that maybe makes [Good first issue] TST: Disallow bare pytest.raises #30999 not as good as a "good first issue" as it appears. I expect to run into more similar cases as I try to finish all the modules, because there are 26 files that only have 1 or 2 bare pytest raises and I suspect it's because they are the hard ones that people didn't know what message to put in. Adding the check to the script seems like kinda a cool project to figure out the ast module better than I currently do, but probably just that and not really worth our time. Unless we really think that we want the linter to do that.
- Come up with, out of whole cloth, the error message that I think should be included in such a
ValueError
, if in the future pandas were to raise one. This is the strategy I took in previous pull requests. (see TST: GH30999 Add match=msg to all "with pytest.raises" in pandas/tests/io/pytables/test_store.py #38609 for example, where I added a new completely invented error message totest_multiple_open_close
). This is also what I was trying to do that led me to read the code and the past issues and decide that I didn't think it deserved an error message at all and therefore that the test didn't make sense and I would remove it. - Put in an obviously completely bogus error message, so that future developers know that it's not a real one, and whoever in the future decides to address that test either by making it pass or by removing it can write a real message in the future.
Anyway, after thinking it through and writing it out, I think I will try for strategy 3, and you can decide if you like whatever message I come up with, but let me know if there's a different strategy that you prefer.
Apologies for the wall of text; I am new to contributing to pandas and obviously spent a lot of time thinking about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref #38718
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New PR #38719
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's been done in past PR's, probably fine here too. Appreciate the thoughtfulness on your approach here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah generally addressing on a new PR is the correct way, will look at that soon.
r"loop of ufunc does not support argument 0 of type .* which " | ||
f"has no callable {func.__name__} method" | ||
) | ||
with pytest.raises(TypeError, match=msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this file look to have caused some failures (https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=50879&view=logs&j=b516280a-8c75-541b-8ba0-ae13561718a7&t=88a240e0-0f0f-5fd5-6f10-d82e025ed5d5)
Looks like AttributeError
possible too. Since these errors are raised externally from numpy
, might be more robust to also just use external_error_raised
and not match on message in case of error message changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it wasn't clear to me whether the error was raised in pandas or numpy. Is that also true of the other error in the same test? i.e. should I change both to tm.external_error_raised
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly, that's also an external error. I think both should use tm.external_error_raised
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok done. let's hope I fixed the tests.
I am confused by the failing tests: None of which I touched? Any idea what is going on here? |
Ah, it's not just this PR. Several recent PRs have those same failures. |
Jedi update causing problems with the CI raised in #38703 |
f"ufunc '{func.__name__}' not supported for the input types, and the " | ||
"inputs could not be safely coerced to any supported types according " | ||
"to the casting rule ''safe''" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's also an external error from numpy
. Even though this works, probably better to be consistent and use tm.external_error_raised
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…to tm.external_error_raised in test_numpy_ufuncs_other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @moink!
@@ -96,13 +94,11 @@ def test_numpy_ufuncs_other(index, func, request): | |||
result = func(index) | |||
assert isinstance(result, np.ndarray) | |||
else: | |||
# raise TypeError or ValueError (PeriodIndex) | |||
with pytest.raises(Exception): | |||
with tm.external_error_raised(TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess these are older comments, cc @jbrockmendel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think so yah
"Cannot convert interval[float64] to interval[uint64]; subtypes are " | ||
"incompatible" | ||
) | ||
with pytest.raises(TypeError, match=msg): | ||
index.astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah generally addressing on a new PR is the correct way, will look at that soon.
idx = pd.MultiIndex.from_product([[1, pd.Timestamp("2000"), 2], ["a", "b"]]) | ||
other = pd.MultiIndex.from_product([[3, pd.Timestamp("2000"), 4], ["c", "d"]]) | ||
|
||
with pytest.raises(TypeError): | ||
msg = "The 'sort' keyword only takes the values of None or False; True was passed." | ||
with pytest.raises(ValueError, match=msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine, a lot of things have been cleaned up recently I guess this was missed. cc @jbrockmendel
cc @jbrockmendel if any comments, go ahead and merge if good |
Hi @moink - can you fetch and merge upstream/master please and resolve the conflict? |
# Conflicts: # pandas/tests/indexes/interval/test_astype.py
@MarcoGorelli ok done and green except travis |
I am not sure why I closed this before - must have been accidental. |
thanks @moink |
This pull request partially addresses xref #30999 to remove bare pytest.raises by adding match=msg. It doesn't close that issue as I have only addressed modules in pandas/tests/indexes
I made a few of the raised errors more specific types. There are two other changes that are a bit more complex but I will put specific comments on the code lines to explain those.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff