-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix error in replace with strings that are large numbers (#25616) #25644
BUG: Fix error in replace with strings that are large numbers (#25616) #25644
Conversation
doc/source/whatsnew/v0.24.2.rst
Outdated
@@ -32,6 +32,7 @@ Fixed Regressions | |||
- Fixed regression in creating a period-dtype array from a read-only NumPy array of period objects. (:issue:`25403`) | |||
- Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`) | |||
- Fixed pip installing from source into an environment without NumPy (:issue:`25193`) | |||
- Fixed regression in :func:`replace` where large strings of numbers would be coerced into int, causing an ``OverflowError`` (:issue:`25616`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use DataFrame.replace as what you have won't render. use double-backticks around int.
pandas/tests/series/test_replace.py
Outdated
@@ -181,6 +181,20 @@ def check_replace(to_rep, val, expected): | |||
tr, v = [3, 4], [3.5, True] | |||
check_replace(tr, v, e) | |||
|
|||
# GH 25616 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a new test
Hello @ArtificialQualia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-03-12 20:45:17 UTC |
Codecov Report
@@ Coverage Diff @@
## master #25644 +/- ##
===========================================
- Coverage 91.26% 41.71% -49.55%
===========================================
Files 173 173
Lines 52968 52968
===========================================
- Hits 48339 22096 -26243
- Misses 4629 30872 +26243
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25644 +/- ##
===========================================
+ Coverage 41.73% 91.29% +49.55%
===========================================
Files 173 173
Lines 52967 52961 -6
===========================================
+ Hits 22106 48350 +26244
+ Misses 30861 4611 -26250
Continue to review full report at Codecov.
|
…fix-overflow-on-replace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the merge conflict in the whatsnew if we want to do this for 0.24.2 @jorisvandenbossche (it's not tagged right now).
Fixed merge conflicts again |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon! If these instruction are inaccurate, feel free to suggest an improvement. |
@ArtificialQualia Thanks a lot ! |
…-dev#25616) (pandas-dev#25644) (cherry picked from commit 12fd316)
Manually backported in f4e1127 |
* master: (22 commits) Fixturize tests/frame/test_operators.py (pandas-dev#25641) Update ValueError message in corr (pandas-dev#25729) DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728) ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720) Make Rolling.apply documentation clearer (pandas-dev#25712) pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714) Json normalize nan support (pandas-dev#25619) TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868) DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486) BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585) Fix concat not respecting order of OrderedDict (pandas-dev#25224) CLN: remove pandas.core.categorical (pandas-dev#25655) TST/CLN: Remove more Panel tests (pandas-dev#25675) Pinned pycodestyle (pandas-dev#25701) DOC: update date of 0.24.2 release notes (pandas-dev#25699) BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644) BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592) BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602) CLN: Remove unused test code (pandas-dev#25670) CLN: remove Panel from concat error message (pandas-dev#25676) ... # Conflicts: # doc/source/whatsnew/v0.25.0.rst
git diff upstream/master -u -- "*.py" | flake8 --diff
See discussion in #25616.
When
.replace
saw a value that looks like an int, it would try to convert it even if it caused anOverflowError
. This issue is only happening in newer versions of pandas due to the addition ofcoerce_to_target_dtype
in_replace_coerce
.coerce_to_target_dtype
is required to fix a lot of other issues, so the fix here was to prevent a coercion to an int that would cause an OverflowError by catching that exception, allowing the values to remain as objects.I tried to play around with
coerce_to_target_dtype
as well (moving it until after the replace, only doing it when covert is True, etc.) but this caused various other coercion and replace tests to fail, so I left that untouched.Tests have been added for both cases where I found
OverflowError
could occur withreplace
.