Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.merge() on int+float produces object since 0.20 #18302

Closed
sobayed opened this issue Nov 15, 2017 · 3 comments
Closed

pd.merge() on int+float produces object since 0.20 #18302

sobayed opened this issue Nov 15, 2017 · 3 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Dtype Conversions Unexpected or buggy dtype conversions good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@sobayed
Copy link

sobayed commented Nov 15, 2017

Code Sample, a copy-pastable example if possible

In [2]: df1 = pd.DataFrame({'key': [1, 2], 'value': [0, 1]})

In [3]: df1.dtypes
Out[3]: 
key      int64
value    int64
dtype: object

In [4]: df2 = pd.DataFrame({'key': [1.0, 2.0], 'other_value': ['A', 'B']})

In [5]: df2.dtypes
Out[5]: 
key            float64
other_value     object
dtype: object

In [6]: print(pd.merge(df1, df2, how='left', on='key').dtypes)
key            object
value           int64
other_value    object
dtype: object

Problem description

I was expecting that in the merged DataFrame's "key" column pandas would either upcast int to float (like it does e.g. when missing values occur in an int column) or leave the column dtype as int.

I checked and confirmed that the latter was the behaviour in pandas 0.19.2

Expected Output

Merging two DataFrames where the key column is of type int in the one and of type float in the other frame produces a key column of type int or float in the resulting frame (not object).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-81-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Dtype Conversions Unexpected or buggy dtype conversions and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 16, 2017
@mroeschke
Copy link
Member

Looks fixed on master. Could use a test.

In [25]: print(pd.merge(df1, df2, how='left', on='key').dtypes)
key             int64
value           int64
other_value    object
dtype: object

In [26]: pd.__version__
Out[26]: '0.25.0.dev0+145.g9c0f6a8d7'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Feb 22, 2019
@rbenes
Copy link
Contributor

rbenes commented Feb 25, 2019

Duplicity of this: #16572?
Fixed in #18352 even with tests...

@WillAyd
Copy link
Member

WillAyd commented Feb 25, 2019

Thanks for finding that @rbenes

@WillAyd WillAyd closed this as completed Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Dtype Conversions Unexpected or buggy dtype conversions good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

No branches or pull requests

5 participants