Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] maybe_upcast_putmast also handle ndarray #25431

Merged
merged 5 commits into from
Mar 22, 2019

Conversation

makbigc
Copy link
Contributor

@makbigc makbigc commented Feb 24, 2019

Follow h-vetinari's footstep. maybe_upcast_putmask was left untouched in #25425. Try to fix the bug so that maybe_upcast_putmask can also handle ndarray as well as Series.

@codecov
Copy link

codecov bot commented Feb 24, 2019

Codecov Report

Merging #25431 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25431      +/-   ##
==========================================
+ Coverage   91.74%   91.77%   +0.03%     
==========================================
  Files         173      173              
  Lines       52923    52925       +2     
==========================================
+ Hits        48554    48572      +18     
+ Misses       4369     4353      -16
Flag Coverage Δ
#multiple 90.33% <100%> (+0.02%) ⬆️
#single 41.72% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/cast.py 90.69% <100%> (+2.53%) ⬆️
pandas/util/testing.py 87.66% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85572de...bf23643. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 24, 2019

Codecov Report

Merging #25431 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25431      +/-   ##
==========================================
+ Coverage   91.27%    91.3%   +0.03%     
==========================================
  Files         173      173              
  Lines       53002    53004       +2     
==========================================
+ Hits        48375    48396      +21     
+ Misses       4627     4608      -19
Flag Coverage Δ
#multiple 89.87% <100%> (+0.03%) ⬆️
#single 41.77% <33.33%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/cast.py 91.36% <100%> (+3.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdfdd77...6b7c02d. Read the comment docs.

@gfyoung gfyoung added Bug Dtype Conversions Unexpected or buggy dtype conversions Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Feb 25, 2019
expected = np.array([10, 61, 12])

result, _ = maybe_upcast_putmask(result, mask, other)
tm.assert_numpy_array_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect these tests already have a home in another, existing test file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, reference the issue number as a comment under each test definition.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what problem is this solving?

from pandas.util import testing as tm


def test_upcast_series():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize these (both tests) as I suspect a myriad of cases (just a couple for now, e.g. int -> float you can start with)

new_result = result.values.copy()
if isinstance(result, np.ndarray):
new_result = result.copy()
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what other types are here? a Series?

this should be converted at a higher level (at the beginning of the function)

@@ -105,7 +105,7 @@ Performance Improvements
Bug Fixes
~~~~~~~~~

-
- :func:`maybe_upcast_putmask` does not replace the element of the ``ndarray`` correctly (:issue:`23823`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't warrant a whatsnew as its an internal impl detail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove

@makbigc
Copy link
Contributor Author

makbigc commented Mar 9, 2019

I investigated further and found that I misunderstood the usage of maybe_upcast_putmask.

The result array should be replaced with the element of other array starting from the beginning, not in the position of mask. This behavior is due to np.place

changeit isn't called. The 1st element is picked as expected.

In [4]: arr1 = np.array([1,2,3])                                                

In [5]: other = np.array([61,62,63])                                            

In [6]: mask = np.array([False, True, False])                                   

In [7]: result, _ = maybe_upcast_putmask(arr1, mask, other)                     

In [8]: result                                                                  
Out[8]: array([ 1, 61,  3])

changeit is called. The 1st element is picked as expected.

In [24]: arr2 = np.array([1,2,3])                                                                                                           

In [25]: other2 = np.array([61.1, 62.2, 63.3])                                                                                              

In [26]: mask = np.array([False, True, False])                                                                                              

In [27]: result, _ = maybe_upcast_putmask(arr2, mask, other2)                                                                               

In [28]: result                                                                                                                             
Out[28]: array([ 1. , 61.1,  3. ])

changeit isn't called. The 1st element is picked as expected.

In [15]: arr2 = np.arange('2019-01-01', '2019-01-04', dtype='datetime64[D]')    

In [16]: other2 = np.arange('2018-01-01', '2018-01-04', dtype='datetime64[D]')  

In [17]: mask = np.array([False, True, False])                                  

In [18]: result, _ = maybe_upcast_putmask(arr2, mask, other2)                   

In [19]: result                                                                 
Out[19]: array(['2019-01-01', '2018-01-01', '2019-01-03'], dtype='datetime64[D]')

So the behavior for Series as an input is not correct. And maybe_upcast_putmask does not expect a series input.

changeit is called. The second element is picked.

>>> s = pd.Series([10, 11, 12])
>>> t = pd.Series([np.nan, 61, np.nan])
>>> from pandas.core.dtypes.cast import maybe_upcast_putmask
>>> result, _ = maybe_upcast_putmask(s, np.array([False, True, False]), t)
>>> result  # correct
0    10
1    61
2    12
dtype: int64

Examples can be added into the function docstring to elucidate the usage. Should I do it in this PR?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update with your new tests. also you can do an assert on the input of the function itself (e.g. assert its an ndarray)

@@ -105,7 +105,7 @@ Performance Improvements
Bug Fixes
~~~~~~~~~

-
- :func:`maybe_upcast_putmask` does not replace the element of the ``ndarray`` correctly (:issue:`23823`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove

@makbigc
Copy link
Contributor Author

makbigc commented Mar 17, 2019

When adding the tests, the following bug is found and fixed.

In [4]: arr = np.arange(10, 15)                                                 

In [5]: other = np.array([61, 62])                                              

In [6]: mask = np.array([False, True, False, True, True])                       

In [7]: result, changed = maybe_upcast_putmask(arr, mask, other)                
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-3ec9bc405a70> in <module>
----> 1 result, changed = maybe_upcast_putmask(arr, mask, other)

~/Code/pandas/pandas/core/dtypes/cast.py in maybe_upcast_putmask(result, mask, other)
    242             else:
    243 
--> 244                 if isna(other[mask]).any():
    245                     return changeit()
    246 

IndexError: boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 5

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, merge master, ping on green.

from pandas.util import testing as tm


def test_upcast_error():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there more cases to consider? e.g. non-ndarray

@jreback jreback added this to the 0.25.0 milestone Mar 19, 2019
@jreback jreback merged commit f2bcb9f into pandas-dev:master Mar 22, 2019
@jreback
Copy link
Contributor

jreback commented Mar 22, 2019

thanks @makbigc keep em coming!

@h-vetinari h-vetinari mentioned this pull request Sep 22, 2019
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/Internals: maybe_upcast_putmask
3 participants