API: support "unique=True" in MultiIndex.get_level_values() #17896

toobaz · 2017-10-16T17:50:48Z

Code Sample, a copy-pastable example if possible

I often find my self doing

In [2]: df = pd.Series(index=pd.MultiIndex.from_product([['A', 'B'], ['a', 'b']]))

In [3]: df.index.get_level_values(0).unique()
Out[3]: Index(['A', 'B'], dtype='object')

Problem description

The above is very inefficient, because first a Series is built which includes a copy of the entire level (possibly using way more memory than the index itself), and only then duplicates are stripped. Other people on SO have faced the same problem, and this is also blocking a fix I wrote for #17845.

I'm pushing a simple PR in seconds.

Expected Output

Same as above, but in an efficient way.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.21.0rc1+19.gb15d92d14
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

toobaz · 2017-10-16T17:52:09Z

By the way: this is in principle related to #2770, which however is being tackled in a different and complementary way.

jreback · 2017-10-16T17:56:17Z

how is this not just .get_levels(..) ?

jreback · 2017-10-16T17:56:55Z

#2770 is handled by remove_unused_levels()

toobaz · 2017-10-16T17:58:14Z

how is this not just .levels ?

.levels includes unused labels (which is why users are often confused by it)

closes pandas-dev#17896

jreback · 2017-10-16T18:03:37Z

ok, you are adding it there, ok!.

I am not sure unique is the right word here.

jreback · 2017-10-16T18:04:09Z

.get_level_values(level, used=False), though I am not sure I like this either.

closes pandas-dev#17896

jorisvandenbossche · 2017-10-16T19:13:09Z

I agree it would be nice to have a clean way to get those unique values, but IMO it does not belong in get_level_values. That method returns the actual values of the Index level, with a length equal to the length of the Index, and IMO we should stick to that contract. Having such a keyword would completely alter the return type of this method.

(not directly a good idea for alternative though)

closes pandas-dev#17896

closes #17896

toobaz added a commit to toobaz/pandas that referenced this issue Oct 16, 2017

API: add "unique=" argument to MultiIndex.get_level_values()

57aa7e6

closes pandas-dev#17896

toobaz mentioned this issue Oct 16, 2017

API: add "level=" argument to MultiIndex.unique() #17897

Merged

4 tasks

jreback added MultiIndex API Design labels Oct 16, 2017

toobaz added a commit to toobaz/pandas that referenced this issue Oct 16, 2017

API: add "unique=" argument to MultiIndex.get_level_values()

8a69543

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Oct 17, 2017

API: add "level=" argument to MultiIndex.unique()

6df918b

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Oct 17, 2017

API: add "level=" argument to MultiIndex.unique()

a99f9ac

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Oct 17, 2017

API: add "level=" argument to MultiIndex.unique()

e5a4635

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Oct 17, 2017

API: add "level=" argument to MultiIndex.unique()

3617e2a

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Oct 29, 2017

API: add "level=" argument to MultiIndex.unique()

a07e2d8

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Oct 30, 2017

API: add "level=" argument to MultiIndex.unique()

f0f7874

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 11, 2017

API: add "level=" argument to MultiIndex.unique()

284649a

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 11, 2017

API: add "level=" argument to MultiIndex.unique()

b1b27dc

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 11, 2017

API: add "level=" argument to MultiIndex.unique()

4d8769e

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 12, 2017

API: add "level=" argument to MultiIndex.unique()

efb1a1b

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 14, 2017

API: add "level=" argument to MultiIndex.unique()

861867c

closes pandas-dev#17896

jreback added this to the 0.22.0 milestone Nov 15, 2017

toobaz added a commit to toobaz/pandas that referenced this issue Nov 18, 2017

API: add "level=" argument to MultiIndex.unique()

e362b9d

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 18, 2017

API: add "level=" argument to MultiIndex.unique()

fbf9eff

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 18, 2017

API: add "level=" argument to MultiIndex.unique()

337e942

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 19, 2017

API: add "level=" argument to MultiIndex.unique()

50f199d

closes pandas-dev#17896

toobaz added a commit to toobaz/pandas that referenced this issue Nov 19, 2017

API: add "level=" argument to MultiIndex.unique()

feb65ed

closes pandas-dev#17896

jorisvandenbossche closed this as completed in #17897 Nov 20, 2017

jorisvandenbossche pushed a commit that referenced this issue Nov 20, 2017

API: add "level=" argument to MultiIndex.unique() (#17897)

3b05a60

closes #17896

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: support "unique=True" in MultiIndex.get_level_values() #17896

API: support "unique=True" in MultiIndex.get_level_values() #17896

toobaz commented Oct 16, 2017

INSTALLED VERSIONS

toobaz commented Oct 16, 2017

jreback commented Oct 16, 2017 •

edited

Loading

jreback commented Oct 16, 2017

toobaz commented Oct 16, 2017

jreback commented Oct 16, 2017

jreback commented Oct 16, 2017

jorisvandenbossche commented Oct 16, 2017

API: support "unique=True" in MultiIndex.get_level_values() #17896

API: support "unique=True" in MultiIndex.get_level_values() #17896

Comments

toobaz commented Oct 16, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

toobaz commented Oct 16, 2017

jreback commented Oct 16, 2017 • edited Loading

jreback commented Oct 16, 2017

toobaz commented Oct 16, 2017

jreback commented Oct 16, 2017

jreback commented Oct 16, 2017

jorisvandenbossche commented Oct 16, 2017

Output of `pd.show_versions()`

jreback commented Oct 16, 2017 •

edited

Loading