Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concat coerces ints to floats if empty DataFrame is present #8902

Closed
EVaisman opened this issue Nov 26, 2014 · 5 comments
Closed

Concat coerces ints to floats if empty DataFrame is present #8902

EVaisman opened this issue Nov 26, 2014 · 5 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@EVaisman
Copy link

It seems that when using concat coerces Series of type int to type float if one of the DataFrames being concatted is empty:

>>> df1 = pd.DataFrame([1],columns=['a'])
>>> df2 = pd.DataFrame(columns=['a'])
>>> df = pd.concat([df1, df2])
>>> df['a'].dtype
dtype('float64')
>>> df1['a'].dtype
dtype('int64')

While if both columns have ints, no coercion happens:

>>> df1 = pd.DataFrame([1],columns=['a'])
>>> df2 = pd.DataFrame([1],columns=['a'])
>>> df = pd.concat([df1, df2])
>>> df['a'].dtype
dtype('int64')
@jreback
Copy link
Contributor

jreback commented Nov 26, 2014

cc @immerrr

This is actually somewhat of a grey area. If you had passed a None, then it would be excluded and so no coercion happens. The fact that you passed a completely empty frame, which is object dtype is the issue. Technically this should coerce, though I can see the case for 'ignoring' it. Since it doesn't have any valid shape.

In [21]: np.prod(df2.shape)
Out[21]: 0

In [22]: np.prod(df1.shape)
Out[22]: 1

IOW, I could see ignoring these in the pre-computation.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 26, 2014
@jreback jreback added this to the 0.16.0 milestone Nov 26, 2014
@immerrr
Copy link
Contributor

immerrr commented Nov 27, 2014

I agree, it should coerce technically, but not practically.

@immerrr
Copy link
Contributor

immerrr commented Nov 27, 2014

I have a lot on my plate right now, but if it waits till I'm done with that, I'll fix this.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@a-y-khan
Copy link
Contributor

a-y-khan commented Apr 5, 2020

In recent Pandas version (v1.0.3), concat coerces ints to object instead of floats:

In [1]: import pandas as pd                    

In [2]: pd.__version__                         
Out[2]: '1.0.3'

In [3]: df1 = pd.DataFrame([1],columns=['a'])  

In [4]: df1['a'].dtype                         
Out[4]: dtype('int64')

In [5]: df2 = pd.DataFrame(columns=['a'])      

In [6]: df2['a'].dtype                         
Out[6]: dtype('O')

In [7]: df = pd.concat([df1, df2])             

In [8]: df['a'].dtype                          
Out[8]: dtype('O')

@mroeschke
Copy link
Member

I think the above behavior was an intentional behavior change a while back (that an empty DataFrame has an object dtype`); therefore, a common dtype between object and int is object.

Closing as the new expected behavior, but happy to reopen if I am misunderstanding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

5 participants