ENH: better MultiIndex.repr #22511

topper-123 · 2018-08-26T08:00:56Z

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Proposal to make a new repr for MultiIndex. Displaying MultiIndex will be based on displaying vertically stacked tuples, as discussed in #13480. This makes it easier to understand the structure of the MultiIndex.

In the proposal we get:

item formatting according to each level's formatting rule,
right-justification for each tuple item,
row-wise truncation according to pd.options.display.max_seq_items,
column-wise truncation according to pd.options.display.width,

A large MultiIndex example will now look like this:

>>> n = 1_000_000
>>> ci = pd.CategoricalIndex(list('a' * n) + (['bcd'] * n),
...                          categories=['a', 'bcd'], ordered=True)
>>> dti =pd.date_range('2000-01-01', freq='s', periods=2 * n)
>>> mi = pd.MultiIndex.from_arrays([ci, ci.codes+9, dti, dti, dti],
...                                names = ['a', 'b', 'x', 'x2', 'x3'])
>>> mi
MultiIndex([(  'a',  9, '2000-01-01 00:00:00', '2000-01-01 00:00:00', ...),
            (  'a',  9, '2000-01-01 00:00:01', '2000-01-01 00:00:01', ...),
            (  'a',  9, '2000-01-01 00:00:02', '2000-01-01 00:00:02', ...),
            (  'a',  9, '2000-01-01 00:00:03', '2000-01-01 00:00:03', ...),
            (  'a',  9, '2000-01-01 00:00:04', '2000-01-01 00:00:04', ...),
            (  'a',  9, '2000-01-01 00:00:05', '2000-01-01 00:00:05', ...),
            (  'a',  9, '2000-01-01 00:00:06', '2000-01-01 00:00:06', ...),
            (  'a',  9, '2000-01-01 00:00:07', '2000-01-01 00:00:07', ...),
            (  'a',  9, '2000-01-01 00:00:08', '2000-01-01 00:00:08', ...),
            (  'a',  9, '2000-01-01 00:00:09', '2000-01-01 00:00:09', ...),
            ...
            ('bcd', 10, '2000-01-24 03:33:10', '2000-01-24 03:33:10', ...),
            ('bcd', 10, '2000-01-24 03:33:11', '2000-01-24 03:33:11', ...),
            ('bcd', 10, '2000-01-24 03:33:12', '2000-01-24 03:33:12', ...),
            ('bcd', 10, '2000-01-24 03:33:13', '2000-01-24 03:33:13', ...),
            ('bcd', 10, '2000-01-24 03:33:14', '2000-01-24 03:33:14', ...),
            ('bcd', 10, '2000-01-24 03:33:15', '2000-01-24 03:33:15', ...),
            ('bcd', 10, '2000-01-24 03:33:16', '2000-01-24 03:33:16', ...),
            ('bcd', 10, '2000-01-24 03:33:17', '2000-01-24 03:33:17', ...),
            ('bcd', 10, '2000-01-24 03:33:18', '2000-01-24 03:33:18', ...),
            ('bcd', 10, '2000-01-24 03:33:19', '2000-01-24 03:33:19', ...)],
           dtype='object', names=['a', 'b', 'x', 'x2', 'x3'], length=2000000)

For further examples, see the added tests in pandas/tests/indexes/multi/test_format.py.

topper-123 · 2018-08-26T08:05:37Z

pandas/tests/indexes/multi/test_format.py

@@ -57,49 +57,6 @@ def test_repr_with_unicode_data():
        assert "\\u" not in repr(index)  # we don't want unicode-escaped


-def test_repr_roundtrip():
-


Note the new implementation breaks round-tripping. This is a worthwhile trade-off as we better clarity with the new repr IMO.

can you put back a test that assert that this raises on round-trip now though. (just a simple example is enough with a comment)

codecov · 2018-08-26T08:36:27Z

Codecov Report

Merging #22511 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22511      +/-   ##
==========================================
- Coverage   91.73%   91.73%   -0.01%     
==========================================
  Files         178      178              
  Lines       50774    50794      +20     
==========================================
+ Hits        46579    46595      +16     
- Misses       4195     4199       +4

Flag	Coverage Δ
#multiple	`90.32% <100%> (ø)`	⬆️
#single	`41.18% <46.66%> (-0.12%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/strings.py	`98.92% <ø> (ø)`	⬆️
pandas/core/indexes/base.py	`96.71% <ø> (ø)`	⬆️
pandas/core/indexes/multi.py	`95.73% <100%> (+0.06%)`	⬆️
pandas/io/formats/printing.py	`86.72% <100%> (+1.16%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`96.88% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8ad9da...1d96c98. Read the comment docs.

jreback · 2018-08-26T11:58:55Z

pandas/io/formats/printing.py

        defaults to the class name of the obj
+    is_multi : bool, default False


this is not going to be acceptable
this cannot know anything about a MultiIndez
you can override the formatters in multi if you really really need

Can it be called line_break_on_values?

The format_object_summary needs to know if the formatter func returns a string or a tuple of strings, as the treatment of each is different (but only sligthly different).

An alternative is as you say to make a different function for MultiIndex-likes, but the functions are going to be very similar and you asked for code reuse in #13480.

yes line_break_one_values is fine. This function just cannot reference any pandas internal things.

topper-123 · 2018-08-26T15:48:13Z

The failures are unrelated:

travis:

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

circli-ci: py27_compat:

ImportError: libgfortran.so.1: cannot open shared object file: No such file or directory

Will rebase and force push to see if this is an intermittent failure.

jreback · 2018-08-31T10:11:45Z

doc/source/whatsnew/v0.24.0.txt

+
+.. ipython:: python
+
+   index1=range(1000)


formatting here (spaces around =)

jreback · 2018-08-31T10:12:12Z

doc/source/whatsnew/v0.24.0.txt

+.. ipython:: python
+
+   index1=range(1000)
+   index2 = pd.Index(['a'] * 500 + ['abc'] * 500)


can you use a more familiar construction, e.g. .from_product

jreback · 2018-08-31T10:12:29Z

doc/source/whatsnew/v0.24.0.txt

+   index2 = pd.Index(['a'] * 500 + ['abc'] * 500)
+   pd.MultiIndex.from_arrays([index1, index2])
+
+For number of rows smaller than :attr:`options.display.max_seq_items`, all


jreback · 2018-08-31T10:12:41Z

doc/source/whatsnew/v0.24.0.txt

+For number of rows smaller than :attr:`options.display.max_seq_items`, all
+values will be shown (default: 100 items). Horizontally, the output will
+truncate, if it's longer than :attr:`options.display.width` (default: 80 characters).
+This solves the problem with outputting large MultiIndex instances to the console.


don't need the last sentence

jreback · 2018-08-31T10:14:41Z

pandas/core/indexes/multi.py

+        Invoked by unicode(df) in py2 only. Yields a Unicode String in both
+        py2/py3.
+        """
+        klass = self.__class__.__name__


this looks looks like a dupe of Index.base.unicode

Good catch. Removed.

jreback · 2018-08-31T10:16:48Z

pandas/io/formats/printing.py

-                head = [x.rjust(max_len) for x in head]
-                tail = [x.rjust(max_len) for x in tail]
+            head, tail = _justify(head, tail, display_width, best_len,
+                                  is_truncated, is_multi)


justify seems incompatible with is_multi (well the new option)?

jreback · 2018-08-31T10:17:26Z

pandas/io/formats/printing.py

+    """
+    Justify each item in head and tail, so they align properly.
+    """
+    if is_multi:


this is getting pretty complicated. e.g. the nested calling of this. maybe ban justify / is_multi

is_multi also needs to justify, but on each value in the tuple for each value, instead of a flexible list of values.

I see it's a bit complicated, but it's also difficult to make it simpler. I've tried containng the new functionality, maybe it's better.

jreback · 2018-08-31T10:18:30Z

pandas/tests/indexes/multi/test_format.py

@@ -57,49 +57,6 @@ def test_repr_with_unicode_data():
        assert "\\u" not in repr(index)  # we don't want unicode-escaped


-def test_repr_roundtrip():
-


can you put back a test that assert that this raises on round-trip now though. (just a simple example is enough with a comment)

jreback · 2018-08-31T10:18:55Z

pandas/tests/indexes/multi/test_format.py

+@pytest.mark.skipif(PY2, reason="repr output is different for python2")
+class TestRepr(object):
+
+    def setup_class(self):


ugg, pls don't use the old unittest style setup, make fixtures instead

pandas/tests/indexes/multi/test_format.py

topper-123 · 2018-08-31T14:35:35Z

I think I've adjusted for all the comments.

topper-123 · 2018-09-01T05:51:48Z

The trvis failure was a ResourceWarning, so unrelated to this PR.

gfyoung · 2018-09-02T09:37:23Z

@topper-123 : FYI, Anaconda has been having some bad servicing issues, so unfortunately, I don't think CI is going to be very cooperative at this point in time.

topper-123 · 2018-09-02T09:45:56Z

Ok, thanks for notifying me.

wrt. the PR, all comments by @jreback should have been addressed. Some further simplifications have also been done: So methods ._format_space and _format_attrs have been removed and MultiIndex now inherits those instead.

topper-123 · 2019-04-10T17:14:33Z

Ping. I would appreciate a resolution to this. To me it starts feeling like a second brexit (i.e. a decision isn't being made) ;-).

jreback · 2019-04-20T18:51:21Z

@jorisvandenbossche this is better than the current. perfection can be in another PR.

pandas/tests/indexes/multi/test_format.py

topper-123 · 2019-05-31T09:24:58Z

Ping.

jreback · 2019-06-03T00:01:53Z

so this has been outstanding for quite a long time. Its better than the current repr. Any remaining objections.

WillAyd · 2019-06-03T00:03:31Z

No I think this is a good enhancement

topper-123 · 2019-06-11T19:53:45Z

I think this should get a decision now, the PR is almost a year old now. If needed it could be elevated to the BDFL, rather than languishing.

Out of optimism, I've just rebased again ;-)

jreback · 2019-06-12T11:17:36Z

let's not let the perfect be the enemy of the good.

unless an actionable counter-proposal with 72 hours I am going to merge.

@jorisvandenbossche

jreback · 2019-06-12T11:18:26Z

cc @pandas-dev/pandas-core

WillAyd

Sorry didn't realize I was still in "Request Changes" - this lgtm!

jreback · 2019-06-19T01:06:02Z

thanks @topper-123 very nice!

I am sure there will be some followups.

topper-123 mentioned this pull request Aug 26, 2018

Shorter MultiIndex representation #21145

Closed

4 tasks

topper-123 commented Aug 26, 2018

View reviewed changes

topper-123 force-pushed the MultiIndex.__repr__ branch 3 times, most recently from dd81bdd to bbee14e Compare August 26, 2018 08:36

topper-123 force-pushed the MultiIndex.__repr__ branch 3 times, most recently from 8508304 to 661e3be Compare August 26, 2018 11:40

jreback requested changes Aug 26, 2018

View reviewed changes

topper-123 force-pushed the MultiIndex.__repr__ branch from 661e3be to f92bb0d Compare August 26, 2018 15:49

gfyoung added Enhancement Output-Formatting __repr__ of pandas objects, to_string MultiIndex labels Aug 27, 2018

jreback requested changes Aug 31, 2018

View reviewed changes

topper-123 force-pushed the MultiIndex.__repr__ branch from 025724e to 30f5f6e Compare August 31, 2018 14:29

topper-123 force-pushed the MultiIndex.__repr__ branch 4 times, most recently from 1807702 to 906e0f7 Compare August 31, 2018 22:10

topper-123 force-pushed the MultiIndex.__repr__ branch 4 times, most recently from c3a76d0 to 359b2a3 Compare September 2, 2018 09:16

topper-123 force-pushed the MultiIndex.__repr__ branch from 359b2a3 to b9f5525 Compare September 3, 2018 18:14

WillAyd requested changes Apr 21, 2019

View reviewed changes

pandas/tests/indexes/multi/test_format.py Outdated Show resolved Hide resolved

topper-123 force-pushed the MultiIndex.__repr__ branch from 866a807 to fce0cf3 Compare April 23, 2019 17:15

topper-123 mentioned this pull request Apr 23, 2019

CLN: Clean-up use of super() in instance methods. #26177

Merged

1 task

topper-123 force-pushed the MultiIndex.__repr__ branch from fce0cf3 to dd32eb4 Compare April 24, 2019 15:43

topper-123 force-pushed the MultiIndex.__repr__ branch 2 times, most recently from d35aeab to af77040 Compare May 31, 2019 07:56

topper-123 added 12 commits June 11, 2019 21:42

ENH: better MultiIndex.__repr__

ce46223

changed according to comments

7bc5364

inherit _format_attrs and _format_space

b502d8e

changed according to comments

0590f46

minor update for doc strings

ff0e93b

Update doc string examples and docs

b36da1c

Comment on support for py2

7a8512e

Improve docs

cb2f904

don't show useless dtype in repr

7c84657

adjust for comments

3ed412a

Py2 tests not needed any more

ad4b083

remove inheritance from 'object'

1d96c98

topper-123 force-pushed the MultiIndex.__repr__ branch from af77040 to 1d96c98 Compare June 11, 2019 19:54

WillAyd approved these changes Jun 12, 2019

View reviewed changes

jreback merged commit d47947a into pandas-dev:master Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: better MultiIndex.repr #22511

ENH: better MultiIndex.repr #22511

topper-123 commented Aug 26, 2018 •

edited

Loading

topper-123 Aug 26, 2018

jreback Aug 31, 2018

codecov bot commented Aug 26, 2018 •

edited

Loading

jreback Aug 26, 2018

topper-123 Aug 26, 2018

topper-123 Aug 26, 2018

jreback Aug 31, 2018

topper-123 commented Aug 26, 2018

jreback Aug 31, 2018

jreback Aug 31, 2018

jreback Aug 31, 2018

jreback Aug 31, 2018

jreback Aug 31, 2018

topper-123 Aug 31, 2018

jreback Aug 31, 2018

jreback Aug 31, 2018

topper-123 Aug 31, 2018

jreback Aug 31, 2018

jreback Aug 31, 2018

topper-123 commented Aug 31, 2018

topper-123 commented Sep 1, 2018

gfyoung commented Sep 2, 2018 •

edited

Loading

topper-123 commented Sep 2, 2018 •

edited

Loading

topper-123 commented Apr 10, 2019

jreback commented Apr 20, 2019

topper-123 commented May 31, 2019

jreback commented Jun 3, 2019

WillAyd commented Jun 3, 2019

topper-123 commented Jun 11, 2019

jreback commented Jun 12, 2019

jreback commented Jun 12, 2019

WillAyd left a comment

jreback commented Jun 19, 2019

		@@ -57,49 +57,6 @@ def test_repr_with_unicode_data():
		assert "\\u" not in repr(index) # we don't want unicode-escaped


		def test_repr_roundtrip():

		defaults to the class name of the obj
		is_multi : bool, default False

ENH: better MultiIndex.__repr__ #22511

ENH: better MultiIndex.__repr__ #22511

Conversation

topper-123 commented Aug 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 26, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 commented Aug 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 commented Aug 31, 2018

topper-123 commented Sep 1, 2018

gfyoung commented Sep 2, 2018 • edited Loading

topper-123 commented Sep 2, 2018 • edited Loading

topper-123 commented Apr 10, 2019

jreback commented Apr 20, 2019

topper-123 commented May 31, 2019

jreback commented Jun 3, 2019

WillAyd commented Jun 3, 2019

topper-123 commented Jun 11, 2019

jreback commented Jun 12, 2019

jreback commented Jun 12, 2019

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Jun 19, 2019

ENH: better MultiIndex.repr #22511

ENH: better MultiIndex.repr #22511

topper-123 commented Aug 26, 2018 •

edited

Loading

codecov bot commented Aug 26, 2018 •

edited

Loading

gfyoung commented Sep 2, 2018 •

edited

Loading

topper-123 commented Sep 2, 2018 •

edited

Loading