Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.get() on ExtensionArray series (and Categorical) indexed by integer returns incorrect result #20882

Closed
Dr-Irv opened this issue Apr 30, 2018 · 2 comments · Fixed by #20885
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 30, 2018

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: import decimal

In [3]: from pandas.tests.extension.decimal.array import DecimalArray
   ...:
   ...: a = DecimalArray([decimal.Decimal(str(i)) for i in range(5)])
   ...: sa = pd.Series(a, index=[2*i for i in range(5)])
   ...:

In [4]: sa
Out[4]:
0    0
2    1
4    2
6    3
8    4
dtype: decimal

In [5]: sa.get(4)
Out[5]: Decimal('4')

In [6]: sb = pd.Series([i for i in range(5)], index=sa.index)

In [7]: sb
Out[7]:
0    0
2    1
4    2
6    3
8    4
dtype: int64

In [8]: sb.get(4)
Out[8]: 2

In [14]: cat = pd.Categorical(values=["a", "b", "c", "a", "b", "c"],
    ...: categories=["a", "b", "c"], ordered=True)

In [15]: s = pd.Series(cat, index=[2*i for i in range(6)])

In [16]: s
Out[16]:
0     a
2     b
4     c
6     a
8     b
10    c
dtype: category
Categories (3, object): [a < b < c]

In [18]: s.get(2)
Out[18]: 'c'

In [22]: s2 = pd.Series(list(s.values), index=s.index)

In [23]: s2
Out[23]:
0     a
2     b
4     c
6     a
8     b
10    c
dtype: object

In [24]: s2.get(2)
Out[24]: 'b'

Problem description

In the above, sb is a standard Series and sb.get(4) returns the element with index value 4. But for sa, which is backed by an ExtensionArray, it is returning the 4th element of the array.

For the series s containing Categorical, s.get(2) is returning the 3rd element of the array, rather than the second.

Expected Output

sa.get(4) should be Decimal('2')

For the Categorical example, s.get(2) should return 'b', similar to the expression s2.get(2).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+811.g4afc75638
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@Dr-Irv Dr-Irv changed the title BUG: Series.get() on ExtensionArray series indexed by integer returns incorrect result BUG: Series.get() on ExtensionArray series (and Categorical) indexed by integer returns incorrect result Apr 30, 2018
@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Apr 30, 2018
@jreback jreback added this to the 0.23.0 milestone May 1, 2018
@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label May 1, 2018
@jreback
Copy link
Contributor

jreback commented May 1, 2018

is this not the same issue as: #14865 ?

@jorisvandenbossche
Copy link
Member

@jreback that issue is about a categorical index, here about accessing EA/categorical values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants