Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: tests for maybe_promote (precursor to #23982) #25637

Merged
merged 26 commits into from
Jun 21, 2019

Conversation

h-vetinari
Copy link
Contributor

First step towards #23833, resp. precursor to #23982.

TL;DR: maybe_promote is quite broken. #23982 tries to come up with tests that it should pass, and #25425 tries to fix the implementation. However, #23982 is quite big, so @jreback asked for a smaller version that (mostly) just tests existing behaviour.

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2019

Hello @h-vetinari! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-21 06:15:45 UTC

@codecov
Copy link

codecov bot commented Mar 10, 2019

Codecov Report

Merging #25637 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25637      +/-   ##
==========================================
+ Coverage   91.26%   91.27%   +<.01%     
==========================================
  Files         173      173              
  Lines       52968    52968              
==========================================
+ Hits        48339    48344       +5     
+ Misses       4629     4624       -5
Flag Coverage Δ
#multiple 89.84% <ø> (ø) ⬆️
#single 41.71% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/cast.py 89% <0%> (+0.83%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26cfa28...ca8f96b. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 10, 2019

Codecov Report

Merging #25637 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25637      +/-   ##
==========================================
- Coverage   91.97%   91.97%   -0.01%     
==========================================
  Files         180      180              
  Lines       50756    50756              
==========================================
- Hits        46685    46683       -2     
- Misses       4071     4073       +2
Flag Coverage Δ
#multiple 90.57% <ø> (ø) ⬆️
#single 41.84% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.94% <0%> (+0.1%) ⬆️
pandas/core/dtypes/cast.py 90.69% <0%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9b081d...636b1f1. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just started glancing at this. I find the box & box_dtype parametrization args very confusing. This is testing a scalar and array, but when / where can this actually be an object dtype of a scalar? I would like to keep the scope of this way way down. This means don't over parmaterize.


# try/except as numpy dtypes (i.e. if result_dtype is np.object_) do not
# know some expected dtypes like DatetimeTZDtype, and hence raise TypeError
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very error prone an would rather make this explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean here. PandasDtypes (which will not be known by a numpy dtype) can come in on either side of the comparison. If you have a way to write this better, let me know.

Copy link
Member

@gfyoung gfyoung May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By more explicit, I think you need to check the type of result_dtype and expected_dtype. That way, you know which way to compare.

pandas/tests/dtypes/cast/test_promote.py Show resolved Hide resolved
@pytest.mark.parametrize('boxed, box_dtype', [
(True, None), # fill_value wrapped in array with auto-dtype
(True, object), # fill_value wrapped in array with object dtype
(False, None) # fill_value directly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does box_dtype of None even mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to explain just that with the comment next to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note here: None takes no extra meaning, but is passed directly to the constructor of np.array(some_value, dtype=None), which corresponds to the default behaviour of numpy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporate this side-note as a comment. That will help to explain the parameterized values better.

Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback: just started glancing at this. I find the box & box_dtype parametrization args very confusing. This is testing a scalar and array, but when / where can this actually be an object dtype of a scalar? I would like to keep the scope of this way way down. This means don't over parmaterize.

Thanks for starting to take a look at this. I intentionally kept these tests together, as the actual tests are the same in 95% of the cases, and it would lead to huge amount of duplication to separate testing the scalar case vs. the array case.

I've tried to annotate at every step what boxed and box_dtype mean, if we can find a better way to write this I'm all ears, but I don't think breaking up the tests is a good approach.


# try/except as numpy dtypes (i.e. if result_dtype is np.object_) do not
# know some expected dtypes like DatetimeTZDtype, and hence raise TypeError
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean here. PandasDtypes (which will not be known by a numpy dtype) can come in on either side of the comparison. If you have a way to write this better, let me know.

pandas/tests/dtypes/cast/test_promote.py Show resolved Hide resolved
@pytest.mark.parametrize('boxed, box_dtype', [
(True, None), # fill_value wrapped in array with auto-dtype
(True, object), # fill_value wrapped in array with object dtype
(False, None) # fill_value directly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to explain just that with the comment next to it.

@gfyoung gfyoung added Testing pandas testing functions or related to the test suite Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Mar 16, 2019
@h-vetinari h-vetinari mentioned this pull request Mar 17, 2019
3 tasks
@jreback
Copy link
Contributor

jreback commented May 12, 2019

though a good idea, closing as stale

@jreback jreback closed this May 12, 2019
@h-vetinari
Copy link
Contributor Author

@jreback: though a good idea, closing as stale

I've worked an inordinate amount of time on #25425, and although I prepared two subsets of it (#23982 and this one), it never got any review. I understand the realities of how scarce reviewing resources are, but it's still disheartening after putting in so much work.

Long story short, all those three PRs (#23982,#25425, #25637) should be reopened, please. I'm ready to keep working on them.

@gfyoung
Copy link
Member

gfyoung commented May 13, 2019

@h-vetinari : I'll re-open this one first. I agree with @jreback that the scope of what you're trying to do is massive. Thus, it would be great if you focused on this PR alone.

We can re-open other ones as the work progresses.

@gfyoung gfyoung reopened this May 13, 2019
@gfyoung
Copy link
Member

gfyoung commented May 13, 2019

it would lead to huge amount of duplication to separate testing the scalar case vs. the array case.

In a separate commit, separate these out as requested. The best way to make the point here is to show the code side-by-side. Are you sure you can't abstract out setup methods for some of the functionality that would be de-duplicated by any chance?

Also, if you could address the merge conflict.

@h-vetinari
Copy link
Contributor Author

@gfyoung
Thanks for reopening.

In a separate commit, separate these out as requested.

I can do this, but essentially every test would be duplicated. I'm testing the same logic in both cases, and actually more than that: that the same promotion logic applies in both scalar and array case.

If I start splitting this, it would make most sense to have one test for scalars and one for arrays, but from that point on, it would be very easy for those tests to diverge. Does this sound like a reasonable trade-off to you? While trivial, it will be a fair bit of work to actually duplicate these, so I'd rather be sure you understand the consequences before-hand.

Are you sure you can't abstract out setup methods for some of the functionality that would be de-duplicated by any chance?

That's exactly what I was aiming for with _check_promote. All the "differences" between the array and scalar case are taken care of by the wrapper, so that each test only has to specify the promotion logic that's being tested.

@gfyoung
Copy link
Member

gfyoung commented May 13, 2019

If I start splitting this, it would make most sense to have one test for scalars and one for arrays, but from that point on

Why don't you try duplicating for one test only. Then we can evaluate before and after on that one test and see whether it makes sense for you to continue forward with the duplication.

That's exactly what I was aiming for with _check_promote.

Hmm...I see. It sounds though (from the comments IIUC) that that might have impacted readability because there were so many different cases to check. Is it possible to maybe have two or three different check promote functions that each handle a subset of cases?

@h-vetinari
Copy link
Contributor Author

@gfyoung

Finally got around to coming back to this:

That's exactly what I was aiming for with _check_promote.

Hmm...I see. It sounds though (from the comments IIUC) that that might have impacted readability because there were so many different cases to check. Is it possible to maybe have two or three different check promote functions that each handle a subset of cases?

I added a docstring and some comments to make it clearer. The function essentially only contains one branch depending on whether fill_value should be passed as scalar or as an array. And that's exactly the part I want to abstract away from the tests further on.

All of those tests have a very simple format:

def test_maybe_promote_XXX_with_YYY(fixtures..., boxed, box_dtype):
    dtype = some_dtype(XXX)
    fill_dtype = some_dtype(YYY)

    if some_condition:
        pytest.xfail('reason')

    # define fill_value (usually implicitly through fill_dtype)
    fill_value = some_scalar(fill_dtype)

    # define expected values
    expected_dtype = some_dtype
    exp_val_for_scalar = some_scalar_value
    exp_val_for_array = some_missing_value_marker

    _check_promote(dtype, fill_value, boxed, box_dtype, expected_dtype,
                   exp_val_for_scalar, exp_val_for_array)

This about as simple as I can imagine it to be (while unifying the treatment of scalar and array case). Short of a new idea (and the docstring), I don't know how to make this clearer.

(speaking of new ideas: I incorporated your typecheck instead of the try/except for PandasExtensionDtype)

DATETIME_DTYPES = ['datetime64[ns]', 'M8[ns]']
TIMEDELTA_DTYPES = ['timedelta64[ns]', 'm8[ns]']
DATETIME64_DTYPES = ['datetime64[ns]', 'M8[ns]']
TIMEDELTA64_DTYPES = ['timedelta64[ns]', 'm8[ns]']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale for renames?

Is this necessary given how massive the diff is already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split off the conftest-related things, of course.

Regarding your specific point: The rename is really important IMO, because the DATETIME_DTYPES (as-is) have nothing to do with datetime.dateime or other such things.

However, when reading a test that is being fed by a datetime_dtype fixture, one could easily think that this tests datetime.datetime.

The current state in conftest specifically contains only DATETIME64_DTYPES, and I believe this distinction is used in many parts of the code base as well.

Copy link
Member

@gfyoung gfyoung May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'm thinking that we should separate this out. The more we can condense this PR, the better.

But hold off on making that PR for the moment (just one more question re: some of other things you've changed for conftest.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the DATETIME_DTYPES -> DATETIME64_DTYPES in a precursor of course, but the actual change that precipitates the necessity is the introduction of datetime64_dtype, which wouldn't be used before this PR. Would you want me to introduce the fixtures with the conftest-precursor, or in this PR?

h-vetinari added a commit to h-vetinari/pandas that referenced this pull request May 31, 2019
@h-vetinari
Copy link
Contributor Author

h-vetinari commented Jun 1, 2019

@gfyoung
This is now slightly reduced in size after #26596. Also, could you please restart the azure CI?

Edit: Nevermind, I had some linting to commit anyway.

@gfyoung
Copy link
Member

gfyoung commented Jun 1, 2019

@jreback : Could you have another look at this

@jreback
Copy link
Contributor

jreback commented Jun 1, 2019

at some point

@jreback
Copy link
Contributor

jreback commented Jun 6, 2019

so why are you haveing every int type against a timedelta64 for example; this is just way overkill. These should all react the same, but having this expanse of tests is just too confusing. Please limit this.

Obviously the same holds true for example datetime.

XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint8-timedelta64[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint8-timedelta64[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint8-m8[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint8-m8[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint16-timedelta64[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint16-timedelta64[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint16-m8[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint16-m8[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint32-timedelta64[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint32-timedelta64[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint32-m8[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint32-m8[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint64-timedelta64[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint64-timedelta64[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint64-m8[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[uint64-m8[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int-timedelta64[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int-timedelta64[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int-m8[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int-m8[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int8-timedelta64[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int8-timedelta64[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int8-m8[ns]-box0-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int8-m8[ns]-box2-np.timedelta64]
  reason: does not upcast correctly
XFAIL pandas/tests/dtypes/cast/test_promote.py::test_maybe_promote_any_with_timedelta64[int16-timedelta64[ns]-box0-np.timedelta64]
  reason: does not upcast correctly

@h-vetinari
Copy link
Contributor Author

@jreback: so why are you haveing every int type against a timedelta64 for example; this is just way overkill. These should all react the same, but having this expanse of tests is just too confusing. Please limit this.

The simple answer is: fixtures - they do exactly what I wanted here: exhaustively (as possible) test every type against (inputs of) every other type. I get the argument that some tests can be overkill, but I can't see why it would be confusing...?

Are you opposed specifically to the number of xfails, I'm guessing? Otherwise, once one starts introducing special cases in this module it becomes less understandable, more fragile, and more complex.

@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

The simple answer is: fixtures - they do exactly what I wanted here: exhaustively (as possible) test every type against (inputs of) every other type. I get the argument that some tests can be overkill, but I can't see why it would be confusing...?

@h-vetinari this add quite a bit of duplicative testing for many types. Please use more selective fixtures. If its obvious that something is doing the same exact cast vs int, int8, int16.....then a single int is enough for a fixture with a nice comment.

@h-vetinari
Copy link
Contributor Author

@jreback
What do you think of the reduced fixture I introduced following your review?

@jreback
Copy link
Contributor

jreback commented Jun 21, 2019

ok @h-vetinari whats 3000+ new tests, lol.

merge master once again and ping on green.

@jreback jreback added this to the 0.25.0 milestone Jun 21, 2019
@h-vetinari
Copy link
Contributor Author

merge master once again and ping on green.

@jreback, this is green. :)

@jreback
Copy link
Contributor

jreback commented Jun 21, 2019

k thanks @h-vetinari

@jreback jreback merged commit f2aea09 into pandas-dev:master Jun 21, 2019
@h-vetinari
Copy link
Contributor Author

@jreback @gfyoung
Thanks for reviewing this big hunk. :)

We can re-open other ones as the work progresses.

Can you please reopen #25425?

I'd also suggest that we reopen #23982 anyway - I don't think it has a chance of getting merged (because it basically just adds a bunch of failing tests), but once rebased it would provide a good insight for the value of the refactor #25425 - as in: all the things that are currently broken and would be fixed by that PR.

@h-vetinari h-vetinari deleted the tst_maybe_promote_precursor branch June 21, 2019 16:34
@h-vetinari
Copy link
Contributor Author

@jreback @gfyoung

@h-vetinari: Can you please reopen #25425?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants