Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.concat of Series with int64 column and Series with int64-ExtensionArray yields int64 #21792

Closed
xhochy opened this issue Jul 7, 2018 · 7 comments
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@xhochy
Copy link
Contributor

xhochy commented Jul 7, 2018

Code Sample

import fletcher as fr
import pandas as pd

df_ext = pd.DataFrame({'a': fr.FletcherArray([1, 2])})
df_ext.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 2 entries, 0 to 1
# Data columns (total 1 columns):
# a    2 non-null fletcher[int64]
# dtypes: fletcher[int64](1)
# memory usage: 100.0 bytes

df_normal = pd.DataFrame({'a': [3, 4]})
df_normal.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 2 entries, 0 to 1
# Data columns (total 1 columns):
# a    2 non-null int64
# dtypes: int64(1)
# memory usage: 96.0 bytes

# Works
pd.concat([df_ext, df_normal]).info()
# <class 'pandas.core.frame.DataFrame'>
# Int64Index: 4 entries, 0 to 1
# Data columns (total 1 columns):
# a    4 non-null object
# dtypes: object(1)
# memory usage: 64.0+ bytes

# yield int64 instead of object
pd.concat([df_ext['a'], df_normal['a']]).dtype
# dtype('int64')

Problem description

This currently leads BaseReshapingTests.test_concat_mixed_dtypes to fail on ExtensionArrays that can be converted to any numeric data NumPy datatype.

xhochy added a commit to xhochy/fletcher that referenced this issue Jul 7, 2018
@xhochy xhochy changed the title pd.concat of Series with int64 column and DataFrame with int64-ExtensionArray yields int64 pd.concat of Series with int64 column and Series with int64-ExtensionArray yields int64 Jul 7, 2018
@jreback
Copy link
Contributor

jreback commented Jul 7, 2018

where do you have a Int64 EA array defined?

@xhochy
Copy link
Contributor Author

xhochy commented Jul 7, 2018

@jreback see my "prototype" package https://github.com/xhochy/fletcher that wraps Arrow arrays of any type into ExtensionArrays. The fr.FletcherArray([1, 2]) gives me a nullable int64 (Arrow-based) ExtensionArray.

@jreback
Copy link
Contributor

jreback commented Jul 7, 2018

as an FYI: #21160

this is currently not dependent on pyarrow, though its pushing the EA protocols.

@jreback
Copy link
Contributor

jreback commented Jul 7, 2018

also xref #21789

@jreback jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jul 7, 2018
@jreback jreback added this to the Contributions Welcome milestone Jul 7, 2018
@jreback jreback added the Bug label Jul 7, 2018
@xhochy
Copy link
Contributor Author

xhochy commented Jul 15, 2018

Fixing this might also be a breaking change, e.g. the following behaviour is probably wanted:

In [4]: pd.concat([pd.Series([1, 2]), pd.Series([1., 2.])])
Out[4]:
0    1.0
1    2.0
0    1.0
1    2.0
dtype: float64

@xhochy
Copy link
Contributor Author

xhochy commented Jul 15, 2018

Narrowed this down to

if any(extensions) and axis == 1:
Here we only do the special handling in the case of axis == 1 but in the series case, we have axis == 0.

Removing the check for axis == 1 sadly then produces a Series backed by a 2D-ndarray:

In [9]: s.values
Out[9]:
array([[1, 2],
       [1, 2]], dtype=object)

@jorisvandenbossche
Copy link
Member

@xhochy this is generally fixed, in the meantime. The example now returns an object-dtype columns, but Fletcher can now override this by implementing the _get_common_dtype on the extension dtype to control casting behaviour in concat (#33607)

@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, No action Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants