Make DataFrame.to_string output full content by default #28052

lshepard · 2019-08-21T03:10:21Z

I modeled this off of #24841. Some alternatives I considered:

Instead of setting the option_context here, we could wind the param into the depths of the formatter. I tried this, actually, and started finding a number of edge cases and bugs. I realized that the issue only occurs in a pretty narrow case - if the user is explicitly calling to_string - because most of the time, when representing a DataFrame, the user will want long strings truncated for readability. So I think the safest way is to do it at the top level without interfering with lower-level formatters.
Series.to_string() could arguably benefit from the same treatment, although that wasn't mentioned in the original issue (and I have never found the need to use it personally) so I didn't bring that in.

Here's an example on a real dataset showing long columns preserved in a text file produced by to_string():

Additional manual testing:

Main use case- by default, no limits and ignores the display options, but can still override:

>>> print(df.to_string())
        A       B
0     NaN     NaN
1 -1.0000     foo
2 -2.1234   foooo
3  3.0000  fooooo
4  4.0000     bar
>>> with option_context('display.max_colwidth', 5):
...     print(df.to_string())
... 
        A       B
0     NaN     NaN
1 -1.0000     foo
2 -2.1234   foooo
3  3.0000  fooooo
4  4.0000     bar
>>> print(df.to_string(max_colwidth=5))
      A     B
0   NaN   NaN
1 -1...   foo
2 -2...  f...
3  3...  f...
4  4...   bar

The string representation of DataFrame does still use the display options (so it's only the explicit to_string that doesn't:

>>> with option_context('display.max_colwidth', 5):
...     print(str(df))
... 
      A     B
0   NaN   NaN
1 -1...   foo
2 -2...  f...
3  3...  f...
4  4...   bar

The new parameter validates for None and positive ints, but rejects anything else:

>>> print(df.to_string(max_colwidth=-5))
    ...
    raise ValueError(msg)
ValueError: Value must be a nonnegative integer or None

closes DataFrame.to_string truncates long strings #9784
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…ot sure if I'll keep it.

starting to feel a bit uncomfortable about it. The max_colwidth is an important feature for legibility in the vast majority of contexts - and one expects the display config setting to work. It is only when invoked at the highest level as to_string() that it should be unlimited. So even though this is a temp commit, I'm about to unwind it I think and try an approach at the top level only.:

… have a quick override at the very top level, and everything else behaves based on that one override.#

…width is 0 instead of a large number. So I set it to a large number (like the html diff) to preserve the justification behavior.

pep8speaks · 2019-08-21T03:10:26Z

Hello @lshepard! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-09-14 14:09:02 UTC

lshepard · 2019-08-21T03:13:56Z

pandas/core/frame.py

@@ -707,11 +708,14 @@ def to_string(
        max_cols=None,
        show_dimensions=False,
        decimal=".",
+        max_colwidth=9999999,


Wanted to note - one of the commenters on the issue asks: "With Pandas 0.25.0, setting display.max_colwidth to a large number stops the truncation but when trying to left justify columns with df.to_string(justify='left'), that same display setting somehow pads columns on the left so they are not left aligned. Is there any present way to prevent truncation and get left justified string columns when output to a terminal? I know a pull request is in process but I would like to do this now. Thanks."

I can accomplish this in my testing by setting max_colwidth=0, which switches the padding to left. It is weird, though, that passing justify="left" does not justify it correctly. Seems like maybe a separate bug or one that I could look further into.

I'm not sure I understand this cause, but I don't think we'll want a workaround like this.

Can you post the output with max_colwidth=0, and how it compares to this?

This shows how the max_colwidth=0 strangely forces a left justification:

>>> print(df.to_string(max_colwidth=0)) A B 0 NaN NaN 1 -1.0000 foo 2 -2.1234 foooo 3 3.0000 fooooo 4 4.0000 bar

Whereas the behavior in this PR preserves the right justification that is the current default:

>>> print(df.to_string(max_colwidth=99999)) A B 0 NaN NaN 1 -1.0000 foo 2 -2.1234 foooo 3 3.0000 fooooo 4 4.0000 bar

And interestingly, passing justify='left' doesn't have an effect:

>>> print(df.to_string(justify='left')) A B 0 NaN NaN 1 -1.0000 foo 2 -2.1234 foooo 3 3.0000 fooooo 4 4.0000 bar

This is because of these lines in _make_fixed_width:

def _make_fixed_width( ... max_len = max(adj.len(x) for x in strings) ... def just(x): if conf_max is not None: if (conf_max > 3) & (adj.len(x) > max_len): x = x[: max_len - 3] + "..." return x strings = [just(x) for x in strings] result = adj.justify(strings, max_len, mode=justify) return result

It checks if conf_max > 3 to apply the dot truncation ... so if it's <= 3 then that isn't called. So it's not just 0 but any of 0, 1, 2, or 3 that causes the justification to line up.

I can spend some more time to better understand why this is happening. I agree that we should not rely on some incidentals of the underlying implementation to determine whether to justify the text.

Ok, I figured out the issue. There are two lines where format_array is called and the justify parameter is not passed all the way through -- so in some places, the justification is being overridden.

Note that the bug where justification doesn't happen if conf_max < 3 already appears - so I think it can probably be pulled out as a separate PR.

Scratch all that. I re-read the docs and I see that I misinterpreted the justify param, as it only refers to the column headers, not the content. In that regard it is behaving correctly. So I think I'll leave the justification question out of this PR.

…the expected values though because the fixed width will be harder to read if the lines are split, and they kind of have to be long to test the truncation...

TomAugspurger

Thanks.

I think we'll need to deprecate the current behavior, rather than just changing the default. Users will need to be explicit about the max_colwidth for now I think.

TomAugspurger · 2019-08-21T16:38:04Z

pandas/core/frame.py

@@ -707,11 +708,14 @@ def to_string(
        max_cols=None,
        show_dimensions=False,
        decimal=".",
+        max_colwidth=9999999,


I'm not sure I understand this cause, but I don't think we'll want a workaround like this.

Can you post the output with max_colwidth=0, and how it compares to this?

lshepard · 2019-08-21T21:15:14Z

I think we'll need to deprecate the current behavior, rather than just changing the default.
Users will need to be explicit about the max_colwidth for now I think.

To make sure I understand, what should happen when the user calls df.to_string() without parameters?

Limits columns to the display.max_colwidth config option (the current default)
No limit to column width (this PR)
Make max_colwidth a required parameter (so vanilla to_string() no longer works)
One of the above, but with a deprecation warning

Sounds like you're suggesting that we continue the current behavior, but with a deprecation warning? Should we only sound the deprecation warning if the data frame contains columns that would have otherwise been truncated? Seems like in most cases, the truncation won't be a difference in behavior and I would hate to make vanilla df.to_string() not work...

…x_colwidth

lshepard · 2019-08-22T21:09:42Z

To update: I chose to punt the justify question as I think that's a separate issue that pre-existed and I may not understand the use case that well anyway.

I changed the behavior of max_colwidth so that it will use None to mean unlimited instead of the sentinel of 999999. I also changed the other two places from where I had copied the style (in the clipboard formatter & the html formatter).

I have not yet added a deprecation warning - happy to do so if you think desired behavior is that the max_colwidth param should be required in the future. That doesn't make as much sense to me.

TomAugspurger · 2019-08-23T17:55:41Z

To make sure I understand, what should happen when the user calls df.to_string() without parameters?

That needs to be decided. It seems that in #28052 we decided that to_html not outputting the entire dataframe was a bug. Presumably we would say the same about this, but I'm not sure.

cc @simonjayhawkins

simonjayhawkins · 2019-08-23T20:30:44Z

To make sure I understand, what should happen when the user calls df.to_string() without parameters?

That needs to be decided. It seems that in #28052 we decided that to_html not outputting the entire dataframe was a bug. Presumably we would say the same about this, but I'm not sure.

cc @simonjayhawkins

we should probably maintain consistency. although long html would probably wrap within a cell and that does not apply to to_string.

lshepard · 2019-08-24T03:00:28Z

I agree- the original issue came from someone calling vanilla to_string and being surprised:

I am calling to_string() without any
parameters and it beautifully fixed-
formatted my dataframe apart from my
very wide filename column, that is being
truncated with "...". How can I avoid that?

I think no truncation by default is the most intuitive approach, and matches to_html behavior. Someone who is surprised can find the parameter to truncate pretty easily in the docs, and it’s unlikely to be a surprise the same way that truncation might be.

TomAugspurger

OK, let's call using the options.display value a bug then.

This will need a release note in doc/source/whatsnew/v1.0.0.rst under bug fixes.

pandas/core/config_init.py

…dn't realize that this also allowed None until checking the docs, but it does so it's the perfect validator for our new parameter.

lshepard · 2019-08-27T03:36:27Z

Thanks for the feedback! I added the whatsnew notice, swapped to the correct validator, and did some more manual testing to ensure it worked as expected.

TomAugspurger

Looking good overall. A few small requests.

pandas/core/frame.py

doc/source/whatsnew/v1.0.0.rst

lshepard · 2019-08-30T16:46:09Z

Ready for review, thanks!

doc/source/whatsnew/v1.0.0.rst

Co-Authored-By: Tom Augspurger <TomAugspurger@users.noreply.github.com>

TomAugspurger · 2019-09-03T11:44:03Z

@lshepard merge conflict in the release notes. Could you merge master & repush?

Merge branch 'master' into issue9784-to-string-truncate-long-strings

Merge branch 'issue9784-to-string-truncate-long-strings' of github.com:lshepard/pandas into issue9784-to-string-truncate-long-strings

lshepard · 2019-09-05T14:00:33Z

Huh, failed with “worker 'gw0' crashed while running 'pandas/tests/test_sorting.py::TestSafeSort::test_labels_out_of_bound[-1]'”. I don’t think that’s related but I’ll take a look.

TomAugspurger · 2019-09-05T15:36:36Z

Could be a random failure. I'd start by repushing an empty commit.

WillAyd

lgtm

lshepard · 2019-09-09T14:34:19Z

Thanks for all the detailed comments on the review!

WillAyd · 2019-09-13T01:57:38Z

@lshepard can you fix the merge conflict on the whatsnew? @TomAugspurger mind taking a look at this one?

TomAugspurger

LGTM too. Just need to fix the merge conflict.

WillAyd · 2019-09-16T02:34:05Z

Nice PR - thanks a lot @lshepard

rswgnu · 2019-09-16T04:40:14Z

Excited to try this out and see it resolved. I hope proper left justification can be resolved soon too. Thanks.

…8052)

Luke Shepard added 5 commits August 20, 2019 20:29

Added a parameter to pass all the way down to specify max_colwidth. N…

af444f0

…ot sure if I'll keep it.

Ok, I removed all the deep changes and parameter-passing. Instead, we…

1ebf091

… have a quick override at the very top level, and everything else behaves based on that one override.#

For some reason, the truncation switches justification if the max_col…

21bbf64

…width is 0 instead of a large number. So I set it to a large number (like the html diff) to preserve the justification behavior.

Added a test to show that this option exists for to_string

cfde48e

lshepard commented Aug 21, 2019

View reviewed changes

Luke Shepard added 2 commits August 20, 2019 22:19

Shortened one line (split across two). It's hard to actually shorten …

1abc2fa

…the expected values though because the fixed width will be harder to read if the lines are split, and they kind of have to be long to test the truncation...

Shortened all the lines even in the test to comply with PEP8

a1d3832

TomAugspurger reviewed Aug 21, 2019

View reviewed changes

Luke Shepard added 5 commits August 22, 2019 10:21

Adding a newline per suggestion from isort

3cf9a6a

Solved the justify problem, and also added some None value for the ma…

762d677

…x_colwidth

Swap out format to be None, ignore justification issues.

d64fcb8

Reformat blac.

6e792f8

Merge branch 'master' into issue9784-to-string-truncate-long-strings

6a5cd97

simonjayhawkins added API Design Output-Formatting __repr__ of pandas objects, to_string labels Aug 25, 2019

TomAugspurger reviewed Aug 26, 2019

View reviewed changes

pandas/core/config_init.py Show resolved Hide resolved

Luke Shepard added 4 commits August 26, 2019 22:09

Merge branch 'master' into issue9784-to-string-truncate-long-strings

889284f

Use the is_nonnegative_int validator for the max_colwidth param. I di…

3116ea1

…dn't realize that this also allowed None until checking the docs, but it does so it's the perfect validator for our new parameter.

Added entry to whatsnew.

840c1a6

Fixed formatting with black.

2c68bbc

TomAugspurger reviewed Aug 27, 2019

View reviewed changes

pandas/core/frame.py Outdated Show resolved Hide resolved

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved

Split whatsnew entry, add versionadded, reorder params

4e7fe82

Luke Shepard added 3 commits August 28, 2019 09:08

Remove double line break

90f0ee0

Oops, didn't mean to update this script

c2b8421

Merge branch 'master' into issue9784-to-string-truncate-long-strings

5a8a525

Luke Shepard added 2 commits August 30, 2019 11:47

Correct word in whatsnew.

da031f2

Merge branch 'master' into issue9784-to-string-truncate-long-strings

20d10b5

TomAugspurger reviewed Aug 30, 2019

View reviewed changes

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved

TomAugspurger added this to the 1.0 milestone Aug 30, 2019

Update doc/source/whatsnew/v1.0.0.rst

0f8119e

Co-Authored-By: Tom Augspurger <TomAugspurger@users.noreply.github.com>

Luke Shepard added 2 commits September 5, 2019 07:57

Resolve conflict in whatsnew

732c2f4

Merge branch 'master' into issue9784-to-string-truncate-long-strings

Merge suggested edits

affef02

Merge branch 'issue9784-to-string-truncate-long-strings' of github.com:lshepard/pandas into issue9784-to-string-truncate-long-strings

Merge branch 'master' into issue9784-to-string-truncate-long-strings

3efa1da

WillAyd approved these changes Sep 6, 2019

View reviewed changes

TomAugspurger approved these changes Sep 13, 2019

View reviewed changes

TomAugspurger mentioned this pull request Sep 13, 2019

Added version policy #28415

Merged

Luke Shepard added 2 commits September 14, 2019 09:06

Merge branch 'master' into issue9784-to-string-truncate-long-strings

6142150

Resolve conflict correctly.

b23da91

WillAyd merged commit d92b46f into pandas-dev:master Sep 16, 2019

WillAyd mentioned this pull request Sep 30, 2019

Read long text from a cell #28683

Closed

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

Make DataFrame.to_string output full content by default (pandas-dev#2…

c68840e

…8052)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

Make DataFrame.to_string output full content by default (pandas-dev#2…

357c8a3

…8052)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make DataFrame.to_string output full content by default #28052

Make DataFrame.to_string output full content by default #28052

lshepard commented Aug 21, 2019 •

edited

Loading

pep8speaks commented Aug 21, 2019 •

edited

Loading

lshepard Aug 21, 2019 •

edited

Loading

TomAugspurger Aug 21, 2019

lshepard Aug 21, 2019

lshepard Aug 22, 2019

lshepard Aug 22, 2019

TomAugspurger left a comment

TomAugspurger Aug 21, 2019

lshepard commented Aug 21, 2019 •

edited

Loading

lshepard commented Aug 22, 2019

TomAugspurger commented Aug 23, 2019

simonjayhawkins commented Aug 23, 2019

lshepard commented Aug 24, 2019

TomAugspurger left a comment

lshepard commented Aug 27, 2019

TomAugspurger left a comment

lshepard commented Aug 30, 2019

TomAugspurger commented Sep 3, 2019

lshepard commented Sep 5, 2019

TomAugspurger commented Sep 5, 2019

WillAyd left a comment

lshepard commented Sep 9, 2019

WillAyd commented Sep 13, 2019

TomAugspurger left a comment

WillAyd commented Sep 16, 2019

rswgnu commented Sep 16, 2019

Make DataFrame.to_string output full content by default #28052

Make DataFrame.to_string output full content by default #28052

Conversation

lshepard commented Aug 21, 2019 • edited Loading

pep8speaks commented Aug 21, 2019 • edited Loading

Comment last updated at 2019-09-14 14:09:02 UTC

lshepard Aug 21, 2019 • edited Loading

Choose a reason for hiding this comment

TomAugspurger Aug 21, 2019

Choose a reason for hiding this comment

lshepard Aug 21, 2019

Choose a reason for hiding this comment

lshepard Aug 22, 2019

Choose a reason for hiding this comment

lshepard Aug 22, 2019

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Aug 21, 2019

Choose a reason for hiding this comment

lshepard commented Aug 21, 2019 • edited Loading

lshepard commented Aug 22, 2019

TomAugspurger commented Aug 23, 2019

simonjayhawkins commented Aug 23, 2019

lshepard commented Aug 24, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

lshepard commented Aug 27, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

lshepard commented Aug 30, 2019

TomAugspurger commented Sep 3, 2019

lshepard commented Sep 5, 2019

TomAugspurger commented Sep 5, 2019

WillAyd left a comment

Choose a reason for hiding this comment

lshepard commented Sep 9, 2019

WillAyd commented Sep 13, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

WillAyd commented Sep 16, 2019

rswgnu commented Sep 16, 2019

lshepard commented Aug 21, 2019 •

edited

Loading

pep8speaks commented Aug 21, 2019 •

edited

Loading

lshepard Aug 21, 2019 •

edited

Loading

lshepard commented Aug 21, 2019 •

edited

Loading