REF: Fix maybe_promote #25425

h-vetinari · 2019-02-24T02:54:29Z

closes BUG/Internals: maybe_promote #23833
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

This PR is the culmination of ongoing work since the start of November, and is therefore a bit on the bigger side, with several notes to make.

Things started out with me wanting to unify .update for Series/DF (#22358), resp. aiming towards a beefed-up update/combine_first/coalesce (#22812). While tackling the former (#23192), I encountered some problems with df.update upcasting stuff unnecessarily (#23606), and while trying to fix it, I ran into problems with maybe_upcast_putmask (#23823), which were directly caused by the utterly broken (and completely untested) maybe_promote (#23833).

I started with writing some tests (#23982), which turned out to be not so trivial, because there's a lot of complexity, and the correct behaviour wasn't alwasy immediate (also encountered some fun numpy bugs in the process: e.g. numpy/numpy#12525, numpy/numpy#12550)

I set out to write out a PR to fix those tests then, with the obvious goal of getting the test suite to pass - already that required a full rewrite of the method. I cracked my own tests after a while, but the test suite eluded me. As it turns out, maybe_promote mixes two very different behaviours - scalar values get cast to the new dtype, whereas arrays return their missing value marker. I tried kludging around this for a while, and decided it wasn't possible without creating a franken-solution.

The next step was to separate these two different behaviours into different functions, maybe_promote_with_scalar and maybe_promote_with_array, where maybe_promote is then just a thin wrapper that switches between the two. Actually also maybe_promote_with_scalar is just a fairly thin wrapper around maybe_promote_with_array, so that the actual many-cased promotion logic does not have to be implemented twice.

Often, the call-sites in the code just need the one or the other, and this could later be broken up correspondingly.

I updated the tests in #23982 (taking care to fully capture all the xfails there) and based this PR on that. This should give already an overview of what changed. In many cases, the current behaviour is broken, but I did make a few design decisions worth noting:

maybe_promote_with_array consistently returns the missing value marker for the updated dtype. Since integer dtypes (plus bools and bytes) cannot hold np.nan, these cases now return None.
all promotion logic is as conservative as possible, also within subtypes. For arrays, promotion always goes by value, and never by dtype. That means that, for example:

    >>> maybe_promote(np.dtype('uint8'), fill_value=np.iinfo('uint8').max + 1)
    (dtype('uint16'), 256)
    >>> maybe_promote(np.dtype('uint8'), fill_value=np.array([-1], dtype='int64'))
    (dtype('int16'), None)

all promotion logic is as type-safe as possible, which means that [x] only stays [x] if the fill_value is of type [x] as well, where x is one of (datetime, timedelta, bool, bytes). Datetimetz must additionally match the timezone.
all scalar fill_values now truly get cast to the updated dtype (before there were lots of ambiguities around int/float/complex/datetime/timedelta subtypes)
I have changed the behavior that strings get interpreted for datetimes/timedeltas. Since this is an untested private method, and the test suite still passes just fine, I think this is actually a good thing, because it's too much in one method. String to datetime/timedelta should need an explicit cast, IMO.

    >>> # master
    >>> maybe_promote(np.dtype('datetime64[ns]'), '2018-01-01')
    (dtype('<M8[ns]'), 1514764800000000000)
    >>> # PR
    >>> maybe_promote(np.dtype('datetime64[ns]'), '2018-01-01')
    (dtype('O'), '2018-01-01')
    >>> # master
    >>> maybe_promote(np.dtype('timedelta64[ns]'), '1 day')
    (dtype('<m8[ns]'), 86400000000000)
    >>> # PR
    >>> maybe_promote(np.dtype('timedelta64[ns]'), '1 day')
    (dtype('O'), '1 day')

iNaT is considered a missing value from the POV of maybe_promote_with_array in all situations. This takes one single integer out of the usable int64-range, but I think this is much cleaner.

There's still a few issues with lib.infer_dtype (e.g. #23554, of which I already fixed the complex case #25382), most notably that it cannot infer datetime64tz yet. Actually, through this PR, I'm learning how broken that method is as well, but fixing that will have to wait for some other time. Among other things, it currently faceplants for PeriodArray / IntervalArray (#23553). I haven't added tests for these types here, but ~9000 tests is already better than nothing, I hope. ;-)

Another point that could/should be considered is how EAs should deal with this (#24246).

jreback · 2019-02-24T03:20:39Z

@h-vetinari pls make it bite-sized and piecemeal.These giant PR's very likely won't be merged as they take too much review time.

jreback

just a brief glance shows this as way too complicated. This must be split up function wise. So as I said on a prior PR this would need to be a separate module with supporting functions.

h-vetinari · 2019-02-24T11:04:57Z

@jreback: @h-vetinari pls make it bite-sized and piecemeal.These giant PR's very likely won't be merged as they take too much review time.

I split off the tests into #23982 already. This PR is just refactoring the method (~200LoC).

just a brief glance shows this as way too complicated. This must be split up function wise. So as I said on a prior PR this would need to be a separate module with supporting functions.

I'm not sure you were addressing me with that (or on which PR) - don't know which modularisation you mean...? In any case, I already made an attempt at modularising things, by splitting the scalar and array case into separate methods.

The method itself is not very complicated, it just has lots of branches (in steps 2 & 4 below) to deal with all the possible inputs:

determine scalar/array case
check if array is empty or all-na
infer dtype
handle promotion logic
(scalar case only) handle casting of fill_value

codecov · 2019-02-24T11:53:32Z

Codecov Report

Merging #25425 into master will decrease coverage by 50.04%.
The diff coverage is 47.65%.

@@             Coverage Diff             @@
##           master   #25425       +/-   ##
===========================================
- Coverage   91.73%   41.69%   -50.05%     
===========================================
  Files         173      173               
  Lines       52856    52932       +76     
===========================================
- Hits        48490    22072    -26418     
- Misses       4366    30860    +26494

Flag	Coverage Δ
#multiple	`?`
#single	`41.69% <47.65%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/cast.py	`48.22% <47.65%> (-39.95%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.35%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.17%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.15%)`	⬇️
... and 131 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3855a27...6792a54. Read the comment docs.

codecov · 2019-02-24T11:53:32Z

Codecov Report

Merging #25425 into master will increase coverage by 0.13%.
The diff coverage is 93.7%.

@@            Coverage Diff             @@
##           master   #25425      +/-   ##
==========================================
+ Coverage   91.85%   91.99%   +0.13%     
==========================================
  Files         180      180              
  Lines       50765    50850      +85     
==========================================
+ Hits        46631    46777     +146     
+ Misses       4134     4073      -61

Flag	Coverage Δ
#multiple	`90.63% <93.7%> (+0.14%)`	⬆️
#single	`41.83% <47.24%> (-0.12%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/dtypes/cast.py	`90.88% <93.7%> (-0.18%)`	⬇️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/arrays/integer.py	`96.3% <0%> (-1.32%)`	⬇️
pandas/core/internals/construction.py	`95.95% <0%> (-0.8%)`	⬇️
pandas/core/internals/blocks.py	`94.38% <0%> (-0.72%)`	⬇️
pandas/core/dtypes/concat.py	`96.58% <0%> (-0.46%)`	⬇️
pandas/core/internals/concat.py	`96.48% <0%> (-0.37%)`	⬇️
pandas/core/arrays/sparse.py	`94.19% <0%> (-0.31%)`	⬇️
pandas/core/internals/managers.py	`96% <0%> (-0.22%)`	⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9f9ca1...321f08d. Read the comment docs.

h-vetinari · 2019-03-07T07:32:13Z

@TomAugspurger @jbrockmendel
Care to take a look here or in #23982 please?

jbrockmendel · 2019-03-07T15:50:53Z

@h-vetinari I'll take a look at this. My schedule is unusually hectic between now and Tuesday, so it might take a few days.

h-vetinari · 2019-03-08T10:31:04Z

@jbrockmendel: @h-vetinari I'll take a look at this.

Thanks!

pandas/core/dtypes/cast.py

pandas/conftest.py

jbrockmendel · 2019-03-08T23:59:02Z

pandas/core/dtypes/cast.py

+    else:
+        fill_type = type(fill_value)
+        raise ValueError('fill_value must either be scalar, or a Series / '
+                         'Index / np.ndarray; received {}'.format(fill_type))


are/should EAs be supported?

That's a design decision, but IMO yes. I've suggested #24246, and would then dispatch in maybe_promote_with_array

pandas/core/dtypes/cast.py

jbrockmendel · 2019-03-09T00:06:34Z

pandas/core/dtypes/cast.py

+            # ndarray, but too high-dimensional
+            fill_value = fill_value.ravel()
+    elif not isinstance(fill_value, (ABCSeries, ABCIndexClass)):
+        fill_type = type(fill_value)


usually we use type(foo).__name__. Any particular reason to not use the .__name__ here?

No, will adapt.

pandas/core/dtypes/cast.py

h-vetinari

@jbrockmendel
Thanks for the review. A few small changes incoming.

pandas/conftest.py

pandas/core/dtypes/cast.py

h-vetinari · 2019-03-10T12:42:04Z

pandas/core/dtypes/cast.py

+    else:
+        fill_type = type(fill_value)
+        raise ValueError('fill_value must either be scalar, or a Series / '
+                         'Index / np.ndarray; received {}'.format(fill_type))


That's a design decision, but IMO yes. I've suggested #24246, and would then dispatch in maybe_promote_with_array

pandas/core/dtypes/cast.py

h-vetinari · 2019-03-10T12:43:02Z

pandas/core/dtypes/cast.py

+            # ndarray, but too high-dimensional
+            fill_value = fill_value.ravel()
+    elif not isinstance(fill_value, (ABCSeries, ABCIndexClass)):
+        fill_type = type(fill_value)


No, will adapt.

jbrockmendel · 2019-09-21T16:23:18Z

@h-vetinari it looks like maybe_promote is only used in 8-10 places outside of the tests. is the ndarray fill_value case even needed? I find that makes the function much harder to reason about.

h-vetinari · 2019-09-22T08:56:46Z

@h-vetinari it looks like maybe_promote is only used in 8-10 places outside of the tests. is the ndarray fill_value case even needed? I find that makes the function much harder to reason about.

The low number of call-sites is what made me think that this can be fixed at all. I'm quite sure that the array-case was necessary, otherwise all those gymnastics could have been avoided. Should be simple enough to test - I'll just take out the array-branch from maybe_promote and see if the CI passes.

h-vetinari · 2019-09-22T10:31:41Z

@jbrockmendel
So, at first glance a failing CI would show the necessity of the array-case, but on second glance, all the failures come from #25431 resp. #23823.

Taking a step back, in #23823 I said

In the context of #23192 (and #23604 / #23606), I want to use pandas.core.dtypes.cast.maybe_upcast_putmask, because it solves exactly the problem I need it to solve.
Unfortunately, it does not work as advertised (and I already found the culprit).
The docstring says:

def maybe_upcast_putmask(result, mask, other):
    """
    A safe version of putmask that potentially upcasts the result
    [...]

The culprit at the time was the array-codepath in maybe_promote, but with the last commit above, it seems that the code-base does not rely anymore on the fact that maybe_promote can consume arrays. As such, it would be possible to change the implementation of maybe_upcast_putmask to use something else (e.g. the array-code I have already, or something else entirely) or just temporarily skip/xfail the tests from #25431, and then have a much simpler maybe_promote replacement that only needs to handle the scalar case.

The downside to that is that it would be harder to keep the various places where promotion logic is defined in sync. Ultimately, I think there's an even larger clean-up necessary, involving maybe_promote, maybe_upcast_putmask, lib.infer_dtype, maybe_convert_objects, etc. which - IMO/AFAICT - should all share similar promotion logic based for example what already exists with the Seen-objects used e.g. in

pandas/pandas/_libs/lib.pyx

Line 1952 in 74cba56

def maybe_convert_objects(ndarray[object] objects, bint try_float=0,

Don't know if or when I ever get around to that, my more immediate goals had been #23192 and #22812 (or rather, not so immediate anymore after over a year, haha ;-)).

jbrockmendel · 2019-09-22T14:58:03Z

Thanks for tracking down the history behind the ndarray support; I'll read up on that.

You've seen #28561 and #28564; let's try to find other parts of this that you can break off into similarly sized/scoped pieaces. e.g. carving out something like maybe_promote_scalar. Anything else come to mind?

Based on a local branch, taking the ndarray cases out of the existing maybe_promote tests simplifies them a ton. Whether to re-implement the ndarray cases separately or just get rid of them depends on whether we can drop the ndarray case completely.

Many of the failing cases can be fixed by changing

fill_value = Timedelta(fill_value).value

to

try:
    fv = Timedelta(fill_value)
except (TypeError, ValueError):
    dtype = np.object_
else:
    fill_value = fv.value

Same for the Timestamp case.

That said, I think that instead of .value these should be to_timedelta64() and to_datetime64(), respectively (except for the NaT case, which is another hassle). Try to get away from using iNaT, which is ambiguous. This change should be relatively late in the process.

Any other ideas? I can keep pushing at this if you're not interested, but I'm hoping you'll keep going just in smaller bits.

jbrockmendel · 2019-09-22T15:00:08Z

pandas/core/dtypes/cast.py

+        elif fill_value.ndim > 1:
+            # ndarray, but too high-dimensional
+            fill_value = fill_value.ravel()
+    elif not isinstance(fill_value, (ABCSeries, ABCIndexClass)):


do we need Series/Index? ATM we just have ndarray right?

I see no harm in permitting them (if arrays are permitted at all), as they they fit into the code without extra effort (and later uses of maybe_upcast_putmask might well plop a Series in there).

Until we have a compelling use case, let's restrict the inputs to 1D, non-empty ndarray

h-vetinari · 2019-09-22T20:16:23Z

Based on a local branch, taking the ndarray cases out of the existing maybe_promote tests simplifies them a ton. Whether to re-implement the ndarray cases separately or just get rid of them depends on whether we can drop the ndarray case completely.

Yeah, this is what I meant above. If we keep everything about maybe_promote in the scalar case, then both the code and the tests could be substantially reduced, but would need refactoring. I'm thinking this should be an entirely separate PR, that might serve as a different goal to aim for (and to cut individual chunks off).

That said, I think that instead of .value these should be to_timedelta64() and to_datetime64(), respectively (except for the NaT case, which is another hassle). Try to get away from using iNaT, which is ambiguous. This change should be relatively late in the process.

I tried to stay as close as possible to existing behaviour, but since we're (potentially) rewriting the method, it's completely fair to question the tests I've written. One choice I made is that I think strings should not be cast to TD/DT automatically. I've tried to comment such decisions in the implementation resp. the test module as well.

iNaT is used quite ubiquitously, so not sure how easy it is to get rid of it, I just considered it in the same category as the other NA-values.

This reverts commit d5aa77b.

jbrockmendel · 2019-10-01T13:44:50Z

@h-vetinari can you rebase? Any plans to do small-pieces PRs for this in the near future? If not, I'm going to keep trying to chip away at this.

h-vetinari · 2019-10-04T14:42:09Z

Sorry for the delay, will try to get to merging soon (should be latest on Sunday)

jbrockmendel · 2019-10-21T01:25:51Z

@h-vetinari are you planning to run with this? Its fine if not, but I think you have a better idea of whats needed here than i do

h-vetinari · 2019-10-21T06:21:46Z

@jbrockmendel
The main question I had (see this comment) was whether it's desired to support the array-case should in maybe_promote at all (since that seemed to be up in the air).

I overlooked your response there - sorry. I'm a bit swamped at the moment, but I'll try to carve out a PR from this one that adds maybe_promote_ndarray. Hopefully should have time on the weekend.

@jbrockmendel: Do you have fixes in mind for the remaining xfailed cases? bite-sized PRs for those would be welcome.

I'll have to have a look at how the current maybe_promote works (I haven't kept up with all your PRs). The array case will probably need to be handled separately anyway. One thing I wanted to avoid initially is to duplicate the promotion logic in two places (since looping over the scalar maybe_promote for the array-case is not an option performance-wise). But I think I'll have to start like that rather than replacing the whole method in one go.

@jbrockmendel: AFAICT the ndarray use cases of maybe_promote are a) pathological object-dtype cases where an ndarray is being treated like a scalar, and b) future use cases you describe. Am I missing any ways a user could get there at the moment?

Right now, the only case in the testing suite I'm aware of are the tests for #23823. Not sure if some other code is using the array-path...

jbrockmendel · 2019-10-28T18:12:33Z

@h-vetinari can you rebase

Merge remote-tracking branch 'upstream/master' into fix_maybe_promote

h-vetinari · 2019-10-28T23:14:28Z

@jbrockmendel: @h-vetinari can you rebase

Added a separate function maybe_promote_with_array that takes care of the array-path (and as such, the corresponding testing directly).

h-vetinari

Some comments

h-vetinari · 2019-10-28T23:15:05Z

pandas/core/dtypes/cast.py

+
+    >>> maybe_promote_with_array(np.dtype('datetime64[ns]'),
+    ...                          fill_value=np.array([None]))
+    (dtype('<M8[ns]'), -9223372036854775808)


TODO: fix this

h-vetinari · 2019-10-28T23:16:20Z

pandas/tests/dtypes/cast/test_promote.py

@@ -654,15 +591,16 @@ def test_maybe_promote_any_with_datetime64(
    )


-# override parametrization due to to many xfails; see GH 23982 / 25425
-@pytest.mark.parametrize("box", [(True, object)])


@jbrockmendel
here we something fell through the cracks in #23982 - this tests never ran with box=False.

h-vetinari · 2019-10-28T23:16:31Z

pandas/tests/dtypes/cast/test_promote.py

@@ -682,8 +620,6 @@ def test_maybe_promote_datetimetz_with_any_numpy_dtype(
    )


-# override parametrization due to to many xfails; see GH 23982 / 25425
-@pytest.mark.parametrize("box", [(True, None), (True, object)])


@jbrockmendel
here we something fell through the cracks in #23982 - this tests never ran with box=False.

it sounds like some of the changes in this test file are valid independent of the changes in the other file. is that correct?

Nope, all removed xfails are due to diverting the array-path through maybe_promote_with_array instead of maybe_promote. For test_maybe_promote_datetimetz_with_any_numpy_dtype and test_maybe_promote_datetimetz_with_datetimetz however, I needed to add xfails for the box=False case because those are not working within maybe_promote yet.

h-vetinari · 2019-10-29T22:09:02Z

@jbrockmendel
This is updated and green. It externalises essentially all array-paths of maybe_promote to a new function (which would allow to rip it out of maybe_promote and replace it with the array-version where necessary).

The good part about having the unified testing module (box fixture and all) is that the code can still check uniformity of the results, even though the methods have completely separate implementations.

jreback · 2019-10-29T22:24:51Z

pandas/core/dtypes/cast.py

@@ -462,6 +501,281 @@ def maybe_promote(dtype, fill_value=np.nan):
    return dtype, fill_value


+def maybe_promote_with_array(dtype, fill_value=np.nan):


this is a huge additional to the technical debt. I am -1 on adding this at all. It is not at all clear whether this logic is correct and/or tested. more to the point, what is the purpose of all of this?

@jreback you can pretty much ignore this PR; I'm asking @h-vetinari to keep it rebased for reference as we identify parts that are worthwhile to break off into bite-size pieces.

more to the point, what is the purpose of all of this?

There are a handful of places where we call maybe_promote where we could have fill_value that is an ndarray. Part of the plan for this is to identify in which of those cases we can rule out ndarray.

i that’s fine

happy to pick off good changes

@jreback: this is a huge additional to the technical debt. I am -1 on adding this at all. It is not at all clear whether this logic is correct and/or tested. more to the point, what is the purpose of all of this?

Please read this comment, not just skim over it.

The array-path in maybe_promote is broken and both you and @jbrockmendel were excited to rip it out. At the same time, there's several potential or future use-cases for the array-case, and so I asked twice how this should be handled.

Having a separate method is IMO the least invasive change, and would eventually still allow to rip out the array-path from maybe_promote. And more importantly, the logic is tested with the same promotion tests, which was the whole point of the tests/dtypes/cast/test_promote.py-module. Lastly, since it's a private method, there's no technical debt.

Lastly, since it's a private method, there's no technical debt.

@h-vetinari i don’t even know what to say anymore

Sorry for the misunderstanding, I meant as in API debt.

The technical debt is already there, in the array-path of maybe_promote. I'm trying to fix it. Feel free to address any of the comments or questions I've raised about this. But if you come into an ancient PR and - without regard for any of the existing context - assert that it must be garbage ("It is not at all clear whether this logic is correct and/or tested [it is]. more to the point, what's the purpose of all of this?"), then I'm gonna respond in kind.

@h-vetinari pls don't respond like this. it is not helpful to anyone.

I can and will come into every PR and make comments. My purpose is to avoid cluttering pandas with technical debt. This PR just adds to it.

jbrockmendel · 2019-10-29T23:19:07Z

pandas/core/dtypes/cast.py

+
+            # comparison mechanics are broken above _int64_max;
+            # use greater equal instead of equal
+            if fill_max >= _int64_max + 1 or fill_min <= _int64_min - 1:


can you use the can_cast machinery machinery currently in the scalar function? or even just dispatch to the scalar function in some cases?

Dispatching to the scalar case is IMO out of the question for performance reasons until this whole code is cythonized (or the logic somehow unified with lib.maybe_convert_object).

jbrockmendel · 2019-10-29T23:20:57Z

pandas/core/dtypes/cast.py

+    See Also
+    --------
+    maybe_promote_with_array : underlying method for array case
+    """


A PR with just (most of) this docstring would be a good start

Will try to do that.

h-vetinari · 2019-10-30T07:14:39Z

pandas/core/dtypes/cast.py

@@ -462,6 +501,281 @@ def maybe_promote(dtype, fill_value=np.nan):
    return dtype, fill_value


+def maybe_promote_with_array(dtype, fill_value=np.nan):


@jreback: this is a huge additional to the technical debt. I am -1 on adding this at all. It is not at all clear whether this logic is correct and/or tested. more to the point, what is the purpose of all of this?

Please read this comment, not just skim over it.

The array-path in maybe_promote is broken and both you and @jbrockmendel were excited to rip it out. At the same time, there's several potential or future use-cases for the array-case, and so I asked twice how this should be handled.

Having a separate method is IMO the least invasive change, and would eventually still allow to rip out the array-path from maybe_promote. And more importantly, the logic is tested with the same promotion tests, which was the whole point of the tests/dtypes/cast/test_promote.py-module. Lastly, since it's a private method, there's no technical debt.

h-vetinari · 2019-10-30T07:15:24Z

pandas/core/dtypes/cast.py

+        elif fill_value.ndim > 1:
+            # ndarray, but too high-dimensional
+            fill_value = fill_value.ravel()
+    elif not isinstance(fill_value, (ABCSeries, ABCIndexClass)):


h-vetinari · 2019-10-30T07:16:36Z

pandas/core/dtypes/cast.py

+
+            # comparison mechanics are broken above _int64_max;
+            # use greater equal instead of equal
+            if fill_max >= _int64_max + 1 or fill_min <= _int64_min - 1:


Dispatching to the scalar case is IMO out of the question for performance reasons until this whole code is cythonized (or the logic somehow unified with lib.maybe_convert_object).

h-vetinari · 2019-10-30T07:17:30Z

pandas/tests/dtypes/cast/test_promote.py

@@ -134,7 +134,7 @@ def _check_promote(
        # box_dtype; the expected value returned from maybe_promote is the
        # missing value marker for the returned dtype.
        fill_array = np.array([fill_value], dtype=box_dtype)
-        result_dtype, result_fill_value = maybe_promote(dtype, fill_array)
+        result_dtype, result_fill_value = maybe_promote_with_array(dtype, fill_array)


@jreback @jbrockmendel
This is the point which diverts the testing of all array-paths in this module to maybe_promote_with_array.

h-vetinari · 2019-10-30T07:19:28Z

pandas/tests/dtypes/cast/test_promote.py

@@ -682,8 +620,6 @@ def test_maybe_promote_datetimetz_with_any_numpy_dtype(
    )


-# override parametrization due to to many xfails; see GH 23982 / 25425
-@pytest.mark.parametrize("box", [(True, None), (True, object)])


Nope, all removed xfails are due to diverting the array-path through maybe_promote_with_array instead of maybe_promote. For test_maybe_promote_datetimetz_with_any_numpy_dtype and test_maybe_promote_datetimetz_with_datetimetz however, I needed to add xfails for the box=False case because those are not working within maybe_promote yet.

h-vetinari · 2019-10-30T07:20:04Z

pandas/core/dtypes/cast.py

+    See Also
+    --------
+    maybe_promote_with_array : underlying method for array case
+    """


Will try to do that.

jreback · 2019-10-30T12:02:29Z

I am happy for @jbrockmendel to pick off parts of this. But this PR will not be merged in any way like this. closing.

h-vetinari added 2 commits February 24, 2019 03:49

TST: add test coverage for maybe_promote

bf05e4c

REF: refactor and fix maybe_promote

60889ea

h-vetinari changed the title ~~Fix maybe promote~~ REF: Fix maybe_promote Feb 24, 2019

h-vetinari mentioned this pull request Feb 24, 2019

TST: add test coverage for maybe_promote #23982

Merged

4 tasks

jreback requested changes Feb 24, 2019

View reviewed changes

Fix remaining failures

6792a54

makbigc mentioned this pull request Feb 24, 2019

[BUG] maybe_upcast_putmast also handle ndarray #25431

Merged

h-vetinari added 4 commits February 24, 2019 17:41

Merge remote-tracking branch 'upstream/master' into fix_maybe_promote

c6043cb

Forgot to flake

8d9a3b7

Another try at int2int...

b903d2e

Last one?

91e6673

gfyoung added Bug Dtype Conversions Unexpected or buggy dtype conversions Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Feb 25, 2019

jbrockmendel reviewed Mar 8, 2019

View reviewed changes

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Mar 8, 2019

View reviewed changes

pandas/conftest.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Mar 8, 2019

View reviewed changes

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Mar 9, 2019

View reviewed changes

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

h-vetinari commented Mar 10, 2019

View reviewed changes

h-vetinari added 2 commits March 10, 2019 13:44

Merge remote-tracking branch 'upstream/master' into fix_maybe_promote

1a9d6a1

Review (jbrockmendel)

c0a3a4e

minor comment improvements/cleanups

2a8691a

exploration: check if array-case is required

d5aa77b

jbrockmendel reviewed Sep 22, 2019

View reviewed changes

Revert "exploration: check if array-case is required"

fa347b6

This reverts commit d5aa77b.

h-vetinari added 2 commits October 28, 2019 20:42

A painful merge

8aab981

Merge remote-tracking branch 'upstream/master' into fix_maybe_promote

adapt array-path to new test behaviour

82ec973

h-vetinari commented Oct 28, 2019

View reviewed changes

h-vetinari added 4 commits October 29, 2019 09:33

Merge remote-tracking branch 'upstream/master' into fix_maybe_promote

3c5c3e0

lint: isort

b5eb1c4

catch irrelevant warning

3976220

fix outdated iNaT-documentation

b8cd4f0

jreback requested changes Oct 29, 2019

View reviewed changes

jbrockmendel reviewed Oct 29, 2019

View reviewed changes

h-vetinari commented Oct 30, 2019

View reviewed changes

jreback closed this Oct 30, 2019

pandas-dev locked and limited conversation to collaborators Oct 30, 2019

		@@ -462,6 +501,281 @@ def maybe_promote(dtype, fill_value=np.nan):
		return dtype, fill_value


		def maybe_promote_with_array(dtype, fill_value=np.nan):

REF: Fix maybe_promote #25425

REF: Fix maybe_promote #25425

Conversation

h-vetinari commented Feb 24, 2019 • edited Loading

jreback commented Feb 24, 2019

jreback left a comment

Choose a reason for hiding this comment

h-vetinari commented Feb 24, 2019

codecov bot commented Feb 24, 2019

Codecov Report

codecov bot commented Feb 24, 2019 • edited Loading

Codecov Report

h-vetinari commented Mar 7, 2019

jbrockmendel commented Mar 7, 2019

h-vetinari commented Mar 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 21, 2019

h-vetinari commented Sep 22, 2019

h-vetinari commented Sep 22, 2019

jbrockmendel commented Sep 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari commented Sep 22, 2019

jbrockmendel commented Oct 1, 2019

h-vetinari commented Oct 4, 2019

jbrockmendel commented Oct 21, 2019

h-vetinari commented Oct 21, 2019

jbrockmendel commented Oct 28, 2019

h-vetinari commented Oct 28, 2019

h-vetinari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari commented Oct 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari Oct 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari Oct 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 30, 2019

h-vetinari commented Feb 24, 2019 •

edited

Loading

codecov bot commented Feb 24, 2019 •

edited

Loading

h-vetinari Oct 30, 2019 •

edited

Loading

h-vetinari Oct 30, 2019 •

edited

Loading