-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: better MultiIndex.__repr__ #22511
Conversation
@@ -57,49 +57,6 @@ def test_repr_with_unicode_data(): | |||
assert "\\u" not in repr(index) # we don't want unicode-escaped | |||
|
|||
|
|||
def test_repr_roundtrip(): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the new implementation breaks round-tripping. This is a worthwhile trade-off as we better clarity with the new repr IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put back a test that assert that this raises on round-trip now though. (just a simple example is enough with a comment)
dd81bdd
to
bbee14e
Compare
Codecov Report
@@ Coverage Diff @@
## master #22511 +/- ##
==========================================
- Coverage 91.73% 91.73% -0.01%
==========================================
Files 178 178
Lines 50774 50794 +20
==========================================
+ Hits 46579 46595 +16
- Misses 4195 4199 +4
Continue to review full report at Codecov.
|
8508304
to
661e3be
Compare
pandas/io/formats/printing.py
Outdated
defaults to the class name of the obj | ||
is_multi : bool, default False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not going to be acceptable
this cannot know anything about a MultiIndez
you can override the formatters in multi if you really really need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be called line_break_on_values
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format_object_summary
needs to know if the formatter
func returns a string or a tuple of strings, as the treatment of each is different (but only sligthly different).
An alternative is as you say to make a different function for MultiIndex-likes, but the functions are going to be very similar and you asked for code reuse in #13480.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes line_break_one_values
is fine. This function just cannot reference any pandas internal things.
The failures are unrelated: travis:
circli-ci: py27_compat:
Will rebase and force push to see if this is an intermittent failure. |
661e3be
to
f92bb0d
Compare
doc/source/whatsnew/v0.24.0.txt
Outdated
|
||
.. ipython:: python | ||
|
||
index1=range(1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting here (spaces around =)
doc/source/whatsnew/v0.24.0.txt
Outdated
.. ipython:: python | ||
|
||
index1=range(1000) | ||
index2 = pd.Index(['a'] * 500 + ['abc'] * 500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use a more familiar construction, e.g. .from_product
doc/source/whatsnew/v0.24.0.txt
Outdated
index2 = pd.Index(['a'] * 500 + ['abc'] * 500) | ||
pd.MultiIndex.from_arrays([index1, index2]) | ||
|
||
For number of rows smaller than :attr:`options.display.max_seq_items`, all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a/the
doc/source/whatsnew/v0.24.0.txt
Outdated
For number of rows smaller than :attr:`options.display.max_seq_items`, all | ||
values will be shown (default: 100 items). Horizontally, the output will | ||
truncate, if it's longer than :attr:`options.display.width` (default: 80 characters). | ||
This solves the problem with outputting large MultiIndex instances to the console. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need the last sentence
pandas/core/indexes/multi.py
Outdated
Invoked by unicode(df) in py2 only. Yields a Unicode String in both | ||
py2/py3. | ||
""" | ||
klass = self.__class__.__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks looks like a dupe of Index.base.unicode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Removed.
pandas/io/formats/printing.py
Outdated
head = [x.rjust(max_len) for x in head] | ||
tail = [x.rjust(max_len) for x in tail] | ||
head, tail = _justify(head, tail, display_width, best_len, | ||
is_truncated, is_multi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
justify seems incompatible with is_multi (well the new option)?
pandas/io/formats/printing.py
Outdated
""" | ||
Justify each item in head and tail, so they align properly. | ||
""" | ||
if is_multi: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is getting pretty complicated. e.g. the nested calling of this. maybe ban justify / is_multi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_multi also needs to justify, but on each value in the tuple for each value, instead of a flexible list of values.
I see it's a bit complicated, but it's also difficult to make it simpler. I've tried containng the new functionality, maybe it's better.
@@ -57,49 +57,6 @@ def test_repr_with_unicode_data(): | |||
assert "\\u" not in repr(index) # we don't want unicode-escaped | |||
|
|||
|
|||
def test_repr_roundtrip(): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put back a test that assert that this raises on round-trip now though. (just a simple example is enough with a comment)
@pytest.mark.skipif(PY2, reason="repr output is different for python2") | ||
class TestRepr(object): | ||
|
||
def setup_class(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugg, pls don't use the old unittest style setup, make fixtures instead
025724e
to
30f5f6e
Compare
I think I've adjusted for all the comments. |
1807702
to
906e0f7
Compare
The trvis failure was a ResourceWarning, so unrelated to this PR. |
c3a76d0
to
359b2a3
Compare
@topper-123 : FYI, Anaconda has been having some bad servicing issues, so unfortunately, I don't think CI is going to be very cooperative at this point in time. |
Ok, thanks for notifying me. wrt. the PR, all comments by @jreback should have been addressed. Some further simplifications have also been done: So methods |
359b2a3
to
b9f5525
Compare
Ping. I would appreciate a resolution to this. To me it starts feeling like a second brexit (i.e. a decision isn't being made) ;-). |
@jorisvandenbossche this is better than the current. perfection can be in another PR. |
866a807
to
fce0cf3
Compare
fce0cf3
to
dd32eb4
Compare
d35aeab
to
af77040
Compare
Ping. |
so this has been outstanding for quite a long time. Its better than the current repr. Any remaining objections. |
No I think this is a good enhancement |
I think this should get a decision now, the PR is almost a year old now. If needed it could be elevated to the BDFL, rather than languishing. Out of optimism, I've just rebased again ;-) |
af77040
to
1d96c98
Compare
let's not let the perfect be the enemy of the good. unless an actionable counter-proposal with 72 hours I am going to merge. |
cc @pandas-dev/pandas-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry didn't realize I was still in "Request Changes" - this lgtm!
thanks @topper-123 very nice! I am sure there will be some followups. |
closes #13480
closes #12423
git diff upstream/master -u -- "*.py" | flake8 --diff
Proposal to make a new repr for MultiIndex. Displaying MultiIndex will be based on displaying vertically stacked tuples, as discussed in #13480. This makes it easier to understand the structure of the MultiIndex.
In the proposal we get:
pd.options.display.max_seq_items
,pd.options.display.width
,A large MultiIndex example will now look like this:
For further examples, see the added tests in pandas/tests/indexes/multi/test_format.py.