Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorter MultiIndex representation #21145

Closed
wants to merge 1 commit into from

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented May 20, 2018

MultiIndex can currently have quite long repr output, which may also take a long time to print.

This PR makes the MultiIndex repr output similar in style to other Index reprs. An alternative would be to output labels and levels by supplying pd.options.display.max_seq_items to ibase.default_pprint, but IMO this is better, as it's more similar to other axis repr's.

Example with few items in index

>>> pd.MultiIndex(levels=[['a', 'b'], ['A', 'B']], labels=[[0, 0, 1, 1], [0, 0, 0, 1]])
MultiIndex(levels=[['a', 'b'], ['A', 'B']],
           labels=[[0, 0, 1, 1], [0, 0, 0, 1]])

Example with many items in index

>>> idx=range(1000)
>>> pd.MultiIndex.from_arrays([idx, idx])
MultiIndex(levels=[[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                    ...
                    990, 991, 992, 993, 994, 995, 996, 997, 998, 999],
                   [  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                    ...
                    990, 991, 992, 993, 994, 995, 996, 997, 998, 999],
                   ],
           labels=[[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                    ...
                    990, 991, 992, 993, 994, 995, 996, 997, 998, 999],
                   [  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                    ...
                    990, 991, 992, 993, 994, 995, 996, 997, 998, 999],
                   ])

If this approach is ok, I can write up whatsnew, tests etc. but would appreciate feedback before I do that.

EDIT: tests and whatsnew added.

@pep8speaks
Copy link

pep8speaks commented May 20, 2018

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 21, 2018 at 19:59 Hours UTC

@codecov
Copy link

codecov bot commented May 20, 2018

Codecov Report

Merging #21145 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21145      +/-   ##
==========================================
- Coverage   91.84%   91.84%   -0.01%     
==========================================
  Files         153      153              
  Lines       49502    49512      +10     
==========================================
+ Hits        45463    45472       +9     
- Misses       4039     4040       +1
Flag Coverage Δ
#multiple 90.23% <100%> (-0.01%) ⬇️
#single 41.87% <9.09%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.11% <100%> (+0.04%) ⬆️
pandas/io/formats/printing.py 88.49% <0%> (-0.89%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81358e8...00c2297. Read the comment docs.

@@ -609,11 +609,28 @@ def _format_attrs(self):
"""
Return a list of tuples of the (attr,formatted_value)
"""
def to_string_helper(obj, attr_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of creating new functions, can you simply override _format_attrs or as appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper is inside _format_attrs, so it’s not visible other places.

Not what you meant, but it could be placed in top level indexes/base.py as eg. _index_to_string, so others could use it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so move this to pandas.io.formats.printing and make as generic as possible.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are changing the repr and no tests broke that seems to be a problem.

@jreback
Copy link
Contributor

jreback commented May 21, 2018

output labels and levels according to pd.options.display.max_seq_items, but IMO this is better,

yes this parameter needs to be used to control the len

@topper-123
Copy link
Contributor Author

pd.options.display.max_seq_items is being used in Index._format_data, so it also works here:

>>> pd.options.display.max_seq_items
100
>>> idx=range(101)
>>> pd.MultiIndex.from_arrays([idx, idx])  # abbreviates
MultiIndex(levels=[[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                    ...
                     91,  92,  93,  94,  95,  96,  97,  98,  99, 100],
                   [  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                    ...
                     91,  92,  93,  94,  95,  96,  97,  98,  99, 100],
                   ],
           labels=[[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                    ...
                     91,  92,  93,  94,  95,  96,  97,  98,  99, 100],
                   [  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
                    ...
                     91,  92,  93,  94,  95,  96,  97,  98,  99, 100],
                   ])
>>> idx=range(100)
>>> pd.MultiIndex.from_arrays([idx, idx])  # doesn't abbreviate
MultiIndex(levels=[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
                    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
                    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
                    45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
                    60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
                    75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
                    90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
                   [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
                    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
                    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
                    45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
                    60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
                    75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
                    90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
                   ],
           labels=[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
                    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
                    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
                    45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
                    60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
                    75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
                    90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
                   [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
                    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
                    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
                    45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
                    60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
                    75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
                    90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
                   ])

If pd.options.display.max_seq_items changes, the point of abbreviation changes as well.

@topper-123 topper-123 force-pushed the multi_index_repr branch 2 times, most recently from 152e057 to 8b70d96 Compare May 21, 2018 18:04
@topper-123
Copy link
Contributor Author

Is this ok?

@@ -15,6 +15,36 @@ and bug fixes. We recommend that all users upgrade to this version.
New features
~~~~~~~~~~~~

.. _whatsnew_0231.enhancements.new_multi_index_repr_:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.24.0

@@ -609,11 +609,28 @@ def _format_attrs(self):
"""
Return a list of tuples of the (attr,formatted_value)
"""
def to_string_helper(obj, attr_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so move this to pandas.io.formats.printing and make as generic as possible.


def test_repr(self):
# GH21145

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the context manager for width here (so need multiple tests for this), e.g. with a small value then a large value to see how it works on different settings.


.. ipython:: python

index1=range(1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make this section a bit shorter overall.

slow to print and make the console output difficult to navigate.

Outputting of ``MultiIndex`` instances now has limits to the number of levels
and labels shown ((:issue:`21145`):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra paren here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be the issue number

slow to print and make the console output difficult to navigate.

Outputting of ``MultiIndex`` instances now has limits to the number of levels
and labels shown ((:issue:`21145`):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be the issue number

@topper-123
Copy link
Contributor Author

Closed in favor of #22511.

@topper-123 topper-123 closed this Aug 26, 2018
@topper-123 topper-123 deleted the multi_index_repr branch October 27, 2018 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement MultiIndex Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abbreviate MultiIndex representation
4 participants