-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas.DataFrame.replace seems taking number string as integer and run into overflow error #25616
Comments
Hmm ok that is strange - investigation and PRs would certainly be welcome |
I'm new to working on pandas, so please note that some of the below may be incorrect. I've looked into this a bit, and it looks like the cause is code added in #21477 in This didn't exist in the older version of pandas that the above code works on, and conversion was handled totally by the An easy fix to this is to just add If we can't change that, a much more narrow fix would be to only call I couldn't see a fix we could make to |
@ArtificialQualia thanks a lot for your input!
I haven't stepped through the code myself but this seems to be the actual root issue - have you tried debugging this behavior to see if a fix can be applied to prevent this? |
I did try both the fixes I mentioned in my comment. They fix the issue mentioned in this issue, as well as passing all the tests in If you want my recommendation, I would add |
@ArtificialQualia if you'd like to submit a PR would probably be easier to review and give you feedback with that. Thanks again! |
No problem, I'll submit a PR. |
…-dev#25616) (pandas-dev#25644) (cherry picked from commit 12fd316)
* master: (22 commits) Fixturize tests/frame/test_operators.py (pandas-dev#25641) Update ValueError message in corr (pandas-dev#25729) DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728) ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720) Make Rolling.apply documentation clearer (pandas-dev#25712) pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714) Json normalize nan support (pandas-dev#25619) TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868) DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486) BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585) Fix concat not respecting order of OrderedDict (pandas-dev#25224) CLN: remove pandas.core.categorical (pandas-dev#25655) TST/CLN: Remove more Panel tests (pandas-dev#25675) Pinned pycodestyle (pandas-dev#25701) DOC: update date of 0.24.2 release notes (pandas-dev#25699) BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644) BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592) BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602) CLN: Remove unused test code (pandas-dev#25670) CLN: remove Panel from concat error message (pandas-dev#25676) ... # Conflicts: # doc/source/whatsnew/v0.25.0.rst
Code Sample
Problem description
I got overflow error:
File "/home/grain/ml/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 683, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 546, in pandas._libs.lib.astype_intsafe
OverflowError: Python int too large to convert to C long
This error occurs on my laptop with pandas 0.24.1, python 3.6.7. The example code can handle the replacement correctly on another computer with an older version pandas 0.20.3.
Expected Output
Since I loaded big id numbers like '100000715097692381911' as string type, the pandas.DataFrame.replace() method should replace it with the corresponding value in the dictionary. But I got overflow error. It seemed pandas took the id string as integer?
And if I modify just the first element of 'user_id' column to 's100000715097692381911', then the code won't run into overflow.
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.6.3
Cython: None
numpy: 1.16.0
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: