BUG: inconsistency between replace dict using integers and using strings (#20656) #21477

peterpanmj · 2018-06-14T12:03:13Z

closes DataFrame.replace with dict behaves inconsistently for integers and strings #20656
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-06-14T15:13:02Z

Codecov Report

Merging #21477 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21477      +/-   ##
==========================================
+ Coverage   92.07%   92.08%   +<.01%     
==========================================
  Files         169      169              
  Lines       50684    50703      +19     
==========================================
+ Hits        46668    46689      +21     
+ Misses       4016     4014       -2

Flag	Coverage Δ
#multiple	`90.49% <100%> (ø)`	⬆️
#single	`42.33% <12.12%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/internals/blocks.py	`94.63% <100%> (+0.17%)`	⬆️
pandas/core/internals/managers.py	`96.48% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb6116f...4729fc5. Read the comment docs.

gfyoung · 2018-06-15T00:51:20Z

@peterpanmj : Good start. Need a whatsnew entry in 0.23.2.

peterpanmj · 2018-06-15T05:12:11Z

@gfyoung : Which subsection under Bug fixes should I add my entry ?

gfyoung · 2018-06-15T06:23:46Z

@peterpanmj : v0.23.2.txt - Other section

jreback · 2018-06-15T17:13:33Z

pandas/core/internals.py

+                    # result will be ['b', b'] after searching for pattern r'a'
+                    # and then changed to ['a', 'a'] for pattern r'b*'
+                    if regex:
+                        if b.dtype == np.object_:


can you use is_object_dtype here

can you change this

jreback · 2018-06-15T17:14:33Z

pandas/core/internals.py

+                            result = b.replace(s, d, inplace=inplace,
+                                               regex=regex,
+                                               mgr=mgr, convert=convert)
+                            new_rb = _extend_blocks(result, new_rb)


instead of this, can you add a new (private) method on the Block itself (and then override for object dtype). It will be much cleaner code and this part becomes really generic.

jreback · 2018-06-21T00:22:02Z

doc/source/whatsnew/v0.23.2.txt

@@ -79,4 +79,4 @@ Bug Fixes

 **Other**

-
+- Bug in :meth:`Series.replace` and meth:`DataFrame.replace` when dict is used as the `to_replace` value and one key in the dict is is another key's value, the results were inconsistent between using integer key and using string key (:issue:`20656`)  


move to 0.24.0

jreback · 2018-06-21T00:22:44Z

pandas/core/internals.py

@@ -1690,6 +1690,13 @@ def _nanpercentile(values, q, axis, **kw):
                              placement=np.arange(len(result)),
                              ndim=ndim)

+    def _coerce_replace(self, mask=None, dst=None, convert=False):
+        if mask.any():


can you add a doc-string

call this: _replace_coerce

jreback · 2018-06-21T00:23:04Z

pandas/core/internals.py

+        if mask.any():
+            self = self.coerce_to_target_dtype(dst)
+            return self.putmask(mask, dst, inplace=True)
+        else:


you don't need the else (just return self)

jreback · 2018-06-21T00:23:16Z

pandas/core/internals.py

+                block = [b.convert(by_item=True, numeric=False, copy=True)
+                         for b in block]
+            return block
+        else:


jreback · 2018-06-21T00:23:23Z

pandas/core/internals.py

        return block

+    def _coerce_replace(self, mask=None, dst=None, convert=False):
+        if mask.any():


add a doc-string

jreback · 2018-06-21T00:24:14Z

pandas/core/internals.py

+                                               regex=regex,
+                                               mgr=mgr, convert=convert)
+                            new_rb = _extend_blocks(result, new_rb)
+                        else:


is there a reason you are not using the newly defined _replace_coerce here?

I kept the logic in master untouched for regex mode. This means regex replace still behave incorrectly (just like how dict replace is behaving now)
_maybe_compare can only do equality compare for now. That is the cause of it. I haven't figure out how to add regex support in _maybe_compare without breaking any existing test. Once _maybe_compare is fixed, this part can be removed. I think @Licht-T is working on this part #20656.

well what I would do is add a regex= param to _replace_coerce and push the logic to the block

In addition to regex= param, it also needs src= param for the regex pattern to match. I think the _replace_coerce should only take a boolean array (mask=) indicating where to put the new value. Meanwhile, regex matching should be done at generating the mask before passing it to the _replace_coerce. Adding regex logic to it might makes it less clear for others and probably let someone else override _replace_coerce or reuse it for regex replace. What I want is keep _replace_coerce just like putmask and handle the regex comparison at _maybe_compare(values, s, operator.eq)

I disagree, this is crazy code here and needs simplification. all of the real work should be done in the blocks themselves, e .g. object is different than the other blocks.

Here should be a simple

rb = [b._replace(.....) for b in rb]

you can pass whatever args you want, but the top level logic is just too complicated here

jreback · 2018-06-22T23:30:13Z

pandas/core/internals.py

@@ -1690,6 +1690,17 @@ def _nanpercentile(values, q, axis, **kw):
                              placement=np.arange(len(result)),
                              ndim=ndim)

+    def _replace_coerce(self, mask=None, dst=None, convert=False):
+        """replace value to dst where mask is true, value is coerce to target


can you add a full Parameters section here

@jreback Do you mean add **kwargs and *args ?

@jreback Or do you mean add a full docsstring with all parameters ?

yes full doc string

jreback · 2018-06-22T23:30:18Z

pandas/core/internals.py

        return block

+    def _replace_coerce(self, mask=None, dst=None, convert=False):
+        if mask.any():


jreback · 2018-06-22T23:30:30Z

pandas/core/internals.py

+                    # result will be ['b', b'] after searching for pattern r'a'
+                    # and then changed to ['a', 'a'] for pattern r'b*'
+                    if regex:
+                        if b.dtype == np.object_:


can you change this

jreback · 2018-06-22T23:32:07Z

pandas/core/internals.py

+                                               regex=regex,
+                                               mgr=mgr, convert=convert)
+                            new_rb = _extend_blocks(result, new_rb)
+                        else:


well what I would do is add a regex= param to _replace_coerce and push the logic to the block

peterpanmj · 2018-07-26T01:30:04Z

Wondering am I working in the right direction ?

jreback · 2018-07-17T19:43:23Z

pandas/core/internals.py

+def _maybe_compare(a, b, regex=False):
+    if not regex:
+        op = lambda x: operator.eq(x, b)
+    else:


why did this change?

I want to add regex support in _maybe_compare. The comparing behavior is decided by param regex ( whether to use regex match or equality comparison) . The result will be a mask that will be passed on to _replace_coerce. This can avoid the result of previous round of comparing overwritten in the succeeding ones, e.g in, {"a":"b", "b":"a"} .

jreback · 2018-07-17T19:43:45Z

pandas/core/internals.py

@@ -5155,9 +5240,8 @@ def _maybe_compare(a, b, op):
    # numpy deprecation warning if comparing numeric vs string-like
    elif is_numeric_v_string_like(a, b):
        result = False
-
    else:


what changed here?

I removed a blank line. Should I keep it ?

jreback · 2018-07-26T12:52:02Z

@peterpanmj yes going in good direction! need to rebase as we moved internals around a bit.

jreback · 2018-07-28T14:30:12Z

can you rebase

…s (# 20656)

jreback

@jbrockmendel can you have a look

jreback · 2018-07-29T15:22:02Z

pandas/core/internals/managers.py

+            if hasattr(s, 'asm8'):
+                return _maybe_compare(maybe_convert_objects(values),
+                                      getattr(s, 'asm8'), reg)
+            if reg and is_re_compilable(s):


these are the same?

My mistake. At first, I wanted to raise a ValueError when regex is True but replacer is not reg compilable. It might not be a good idea to do it here. I will delete the if condition.

jreback · 2018-07-29T15:22:41Z

pandas/core/internals/managers.py

@@ -1890,7 +1894,12 @@ def _consolidate(blocks):
    return new_blocks


-def _maybe_compare(a, b, op):
+def _maybe_compare(a, b, regex=False):
+    if not regex:


can you add a doc-string here

jreback · 2018-07-29T15:23:13Z

pandas/core/internals/blocks.py

@@ -2464,7 +2502,7 @@ def replace(self, to_replace, value, inplace=False, filter=None,
                                    regex=regex, mgr=mgr)

    def _replace_single(self, to_replace, value, inplace=False, filter=None,
-                        regex=False, convert=True, mgr=None):
+                        regex=False, convert=True, mgr=None, mask=None):


can you add a doc-string here

jreback · 2018-07-29T15:23:32Z

doc/source/whatsnew/v0.24.0.txt

@@ -573,6 +573,5 @@ Other
 - :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`)
 - Require at least 0.28.2 version of ``cython`` to support read-only memoryviews (:issue:`21688`)
 - :meth: `~pandas.io.formats.style.Styler.background_gradient` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` (:issue:`15204`)
-


move to reshaping

jbrockmendel · 2018-07-29T22:37:44Z

Looks like a nice bit of cleanup in Manager. For Block I wonder if it could share code with Index,
(Or an object EA?) but that’s a question for another day.

…_compare

jreback

lgtm. some cosmetic things

jreback · 2018-07-31T12:48:45Z

pandas/core/internals/blocks.py

+
+        Parameters
+        ----------
+        mask : array_like of bool


can you add optional to all of these args

can you reorder these args to match as much as possible _replace_single

jreback · 2018-07-31T12:49:07Z

pandas/core/internals/blocks.py

+        dst : object
+            The value to be replaced with.
+        convert : bool
+            If true, try to coerce any object types to better types.


add the inplace arg

jreback · 2018-07-31T12:50:03Z

pandas/core/internals/blocks.py

+        value.
+
+        Parameters
+        ----------


same as above (you can make a shared doc-string if you want here)

jreback · 2018-07-31T12:50:26Z

pandas/core/internals/managers.py

@@ -571,12 +573,15 @@ def replace_list(self, src_list, dest_list, inplace=False, regex=False,
        # figure out our mask a-priori to avoid repeated replacements
        values = self.as_array()

-        def comp(s):
+        def comp(s, reg=False):
            if isna(s):


can you add a doc-string, rename reg -> regex

jreback · 2018-07-31T12:52:42Z

pandas/core/internals/managers.py

@@ -1890,7 +1891,28 @@ def _consolidate(blocks):
    return new_blocks


-def _maybe_compare(a, b, op):
+def _maybe_compare(a, b, regex=False):


can you make this slightly more verbose name, can you move to pandas/core/ops.py (cc @jbrockmendel good location)?

Do we anticipate using it elsewhere? If not I'd leave it here at least for now.

like name it _compare_or_regex_match ? @jreback

jreback · 2018-07-31T12:53:02Z

pandas/tests/series/test_replace.py

@@ -243,6 +243,13 @@ def test_replace_string_with_number(self):
        expected = pd.Series([1, 2, 3])
        tm.assert_series_equal(expected, result)

+    def test_repace_intertwined_key_value_dict(self):
+        # GH 20656


can you add a 1-liner explaining the test in a bit more detail

typo repace --> replace (?)

jreback · 2018-08-09T10:51:59Z

thanks @peterpanmj nice patch!

…ngs (pandas-dev#20656) (pandas-dev#21477)

gfyoung added Bug Strings String extension data type and string data Compat pandas objects compatability with Numpy or Python functions labels Jun 15, 2018

jreback requested changes Jun 15, 2018

View reviewed changes

jreback requested changes Jun 21, 2018

View reviewed changes

jreback requested changes Jun 22, 2018

View reviewed changes

jreback requested changes Jul 26, 2018

View reviewed changes

BUG: align logic between replace dict using integers and using string…

d8f2d70

…s (# 20656)

peterpanmj force-pushed the replace_object branch from 818fec4 to d8f2d70 Compare July 29, 2018 11:20

jreback requested changes Jul 29, 2018

View reviewed changes

peterpanmj added 3 commits July 31, 2018 11:15

remove unused condition BlockManager

3afd287

add docstring for ObjectBlock._replace_single and BlockManager._maybe…

2bafaaa

…_compare

update whatsnew entry ,move to reshaping

f76b2e2

jreback requested changes Jul 31, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Jul 31, 2018

peterpanmj added 4 commits August 3, 2018 23:22

cosmetic changes and doc-string enhancing

6f836b4

pull and update whatsnew

dd916d2

Merge branch 'master' into replace_object

7b1624b

Merge branch 'master' into replace_object

4729fc5

jreback approved these changes Aug 9, 2018

View reviewed changes

jreback merged commit 720d263 into pandas-dev:master Aug 9, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: inconsistency between replace dict using integers and using stri…

04e4a6c

…ngs (pandas-dev#20656) (pandas-dev#21477)

ArtificialQualia mentioned this pull request Mar 10, 2019

pandas.DataFrame.replace seems taking number string as integer and run into overflow error #25616

Closed

peterpanmj deleted the replace_object branch March 6, 2023 06:04

BUG: inconsistency between replace dict using integers and using strings (#20656) #21477

BUG: inconsistency between replace dict using integers and using strings (#20656) #21477

Conversation

peterpanmj commented Jun 14, 2018 • edited Loading

codecov bot commented Jun 14, 2018 • edited Loading

Codecov Report

gfyoung commented Jun 15, 2018

peterpanmj commented Jun 15, 2018

gfyoung commented Jun 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterpanmj Jun 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterpanmj commented Jul 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterpanmj Jul 26, 2018 • edited Loading

Choose a reason for hiding this comment

jreback commented Jul 26, 2018

jreback commented Jul 28, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jul 29, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterpanmj Aug 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 9, 2018

peterpanmj commented Jun 14, 2018 •

edited

Loading

codecov bot commented Jun 14, 2018 •

edited

Loading

peterpanmj Jun 23, 2018 •

edited

Loading

peterpanmj Jul 26, 2018 •

edited

Loading

peterpanmj Aug 1, 2018 •

edited

Loading