Fix DataFrame.to_string() justification #22437

gshiba · 2018-08-21T07:59:21Z

closes to_string formatters not as expected when header=False #16839, Justification is broken with to_string(index=False) #13032
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Fixes justification with DataFrame.to_string(index=False), but breaks other tests :-(

There are two side-effects of this PR, which will likely make it unacceptable, but wanted to get suggestions if possible.

It changes the expected behavior with trailing spaces as shown below.

import pandas as pd
def wrap_to_string(df, **kwargs):
    s = df.to_string(**kwargs)
    print(str(kwargs).center(40, '='))
    for i, line in enumerate(s.split('\n')):
        print(f'^{line}$-{i}')
    print('~' * 40)
df = pd.DataFrame({'x': [11, -22], 'y': ['aaa', ' ']})
wrap_to_string(df)
wrap_to_string(df, line_width=1)
wrap_to_string(df, index=False)
wrap_to_string(df, index=False, line_width=1)

Current output with master

==================={}===================  # Looks good
^    x    y$-0
^0  11  aaa$-1
^1 -22     $-2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==========={'line_width': 1}============  # Looks good (to me anyways) but has trailing spaces
^    x  \$-0
^0  11   $-1
^1 -22   $-2
^$-3
^     y  $-4
^0  aaa  $-5
^1       $-6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
============{'index': False}============  # broken justification (this is what I want to fix)
^x    y$-0
^11  aaa$-1
^-22$-2                                   # the last value (space) disappeared completely
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==={'index': False, 'line_width': 1}====  # still broken
^x  \$-0
^11   $-1
^-22   $-2
^$-3
^  y  $-4
^aaa$-5                                   # the last value (space) disappeared completely
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Output with this PR

==================={}===================  # No change from master
^    x    y$-0
^0  11  aaa$-1
^1 -22     $-2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==========={'line_width': 1}============  # No change from master
^    x  \$-0
^0  11   $-1
^1 -22   $-2
^$-3
^     y  $-4
^0  aaa  $-5
^1       $-6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
============{'index': False}============  # Justification is fixed
^  x    y$-0
^ 11  aaa$-1
^-22     $-2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==={'index': False, 'line_width': 1}====  # Justification is fixed
^  x  \$-0
^ 11   $-1
^-22   $-2
^$-3
^  y  $-4
^aaa  $-5
^     $-6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It breaks a bunch of other tests that use
IntArrayFormatter._format_strings because the formatting changed from '{x: d}' which adds a leading space for positive values, to '{x:d}'. My impression is that this PR eliminates an arguably unnecessary extra space for positive integers, but I haven't looked at those tests in too much detail (eg I've never used to_latex).

The tests that break and an example is shown below.

pandas/tests/arrays/categorical/test_repr.py::TestCategoricalRepr::test_print_none_width FAILED                                                                                                                  [  4%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_east_asian_unicode_frame FAILED                                                                                                            [  8%] 
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_datetimelike_frame FAILED                                                                                                                  [ 12%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_to_string_no_header FAILED                                                                                                                 [ 16%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_to_string_specified_header FAILED                                                                                                          [ 20%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_to_string_index_formatter FAILED                                                                                                           [ 25%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_to_string_line_width FAILED                                                                                                                [ 29%] 
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_east_asian_unicode_series FAILED                                                                                                              [ 33%] 
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_period FAILED                                                                                                                                 [ 37%]
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_truncate_ndots FAILED                                                                                                                         [ 41%] 
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_to_string_name FAILED                                                                                                                         [ 45%] 
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_to_string_dtype FAILED                                                                                                                        [ 50%]
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_to_string_length FAILED                                                                                                                       [ 54%]
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_to_string_header FAILED                                                                                                                       [ 58%]
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex FAILED                                                                                                                                      [ 62%] 
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex_multiindex FAILED                                                                                                                           [ 66%] 
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex_multicolumnrow FAILED                                                                                                                       [ 70%] 
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex_longtable FAILED                                                                                                                            [ 75%] 
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex_no_header FAILED                                                                                                                            [ 79%] 
pandas/tests/series/test_repr.py::TestSeriesRepr::test_multilevel_name_print FAILED                                                                                                                              [ 83%]
pandas/tests/series/test_repr.py::TestCategoricalRepr::test_categorical_repr FAILED                                                                                                                              [ 87%]
pandas/tests/series/test_repr.py::TestCategoricalRepr::test_categorical_series_repr FAILED                                                                                                                       [ 91%] 
pandas/tests/series/test_repr.py::TestCategoricalRepr::test_categorical_series_repr_ordered FAILED                                                                                                               [ 95%]
pandas/tests/sparse/test_format.py::TestSparseSeriesFormatting::test_sparse_int FAILED                                                                                                                           [100%]

______________________________________________________________________________________ TestCategoricalRepr.test_print_none_width _______________________________________________________________________________________
                                                                                                                                                                                                                        
self = <pandas.tests.arrays.categorical.test_repr.TestCategoricalRepr object at 0x12892def0>                                                                                                                            
                                                                                                                                                                                                                        
    def test_print_none_width(self):                                                                                                                                                                                    
        # GH10087                                                                                                                                                                                                       
        a = Series(Categorical([1, 2, 3, 4]))                                                                                                                                                                           
        exp = u("0    1\n1    2\n2    3\n3    4\n" +                                                                                                                                                                    
                "dtype: category\nCategories (4, int64): [1, 2, 3, 4]")                                                                                                                                                 
                                                                                                                                                                                                                        
        with option_context("display.width", None):                                                                                                                                                                    
>           assert exp == repr(a)                                                                                                                                                                                      
E           AssertionError: assert '0    1\n1   ... [1, 2, 3, 4]' == '0   1\n1   2\... [1, 2, 3, 4]'                                                                                                                   
E             - 0    1                                                                                                                                                                                                 
E             ?  -                                                                                                                                                                                                      
E             + 0   1                                                                                                                                                                                                   
E             - 1    2                                                                                                                                                                                                 
E             ?  -                                                                                                                                                                                                      
E             + 1   2                                                                                                                                                                                                   
E             - 2    3                                                                                                                                                                                                 
E             ?  -                                                                                                                                                                                                     
E             + 2   3                                                                                                                                                                                                  
E             - 3    4                                                                                                                                                                                                  
E             ?  -                                                                                                                                                                                                      
E             + 3   4                                                                                                                                                                                                   
E               dtype: category                                                                                                                                                                                         
E               Categories (4, int64): [1, 2, 3, 4]                                                                                                                                                                     
                                                                                                                                                                                                                       
pandas/tests/arrays/categorical/test_repr.py:59: AssertionError

The root issue, I think, is that we have '{x: d}'.format(x=x), but the current tests don't want the leading space for a simple pd.DataFrame({'x': [0, 1], 'y': [2, 3]}).to_string(index=False).

I have another attempt at this issue, which explicitly expects the leading space, but may not be palatable to some...

Thanks.

datapythonista

I don't think we want to add tailing spaces to the output of .to_string(), not sure what's the problem, can't it be fixed? I guess the leading space is ok.

datapythonista · 2018-08-21T12:37:29Z

pandas/tests/io/formats/test_format.py

+            DataFrame({'x': [0.1, 0.2, -0.3], 'y': [4, 5, 6]}),
+            DataFrame({'x': [0.1, 0.2, -0.3], 'y': [0.4, 0.5, 0.6]}),
+            DataFrame({'x': [0.1, 0.2, -0.3], 'y': [0.4, 0.5, -0.6]}),
+        ]


In cases like this is preferred to use pytest parametrization.

Will fix if/when general direction of this PR is accepted.

gshiba · 2018-08-21T15:37:58Z

I updated the original description to include current and proposed behavior.

gshiba · 2018-08-22T05:14:24Z

I just updated one of the test modules that was failing (pandas/tests/io/formats/test_format.py) and there are 4 more modules to go.

It seems a lot of things assume/depend on the % d formatting, which makes me think that my other approach is more sensible... Please let me know what you think.

As an aside, while fixing test_format.py I think I found some (unrelated?) formatting issues...

import pandas as pd
df = pd.DataFrame(123, [0, 10], [0, 11])
print(df.to_string())
print('---')
s = pd.Series([0, 100, 200])
print(s.to_string(max_rows=2))

outputs

     0    11  # '0' is misaligned
0   123  123
10  123  123
---
0      0
    ...   # dots are misaligned
2    200

gshiba · 2018-08-25T14:27:47Z

#22505 is probably a better fix

datapythonista · 2018-09-23T13:27:08Z

Closing in favor of #22505. Will reopen if needed.

gshiba added 2 commits August 20, 2018 23:09

Fix to_string()

04bdc30

Fix flake8 errors

1db02f6

datapythonista reviewed Aug 21, 2018

View reviewed changes

datapythonista added Bug IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string labels Aug 21, 2018

Updated tests. test_format now passes

28a6ea0

Update tests

580a81c

gshiba mentioned this pull request Aug 25, 2018

Fix DataFrame.to_string() justification (2) #22505

Merged

4 tasks

datapythonista closed this Sep 23, 2018

gshiba mentioned this pull request Jan 29, 2019

BUG: on .to_string(index=False) #25000

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DataFrame.to_string() justification #22437

Fix DataFrame.to_string() justification #22437

gshiba commented Aug 21, 2018 •

edited

Loading

datapythonista left a comment

datapythonista Aug 21, 2018

gshiba Aug 21, 2018

gshiba commented Aug 21, 2018

gshiba commented Aug 22, 2018

gshiba commented Aug 25, 2018

datapythonista commented Sep 23, 2018

Fix DataFrame.to_string() justification #22437

Fix DataFrame.to_string() justification #22437

Conversation

gshiba commented Aug 21, 2018 • edited Loading

datapythonista left a comment

Choose a reason for hiding this comment

datapythonista Aug 21, 2018

Choose a reason for hiding this comment

gshiba Aug 21, 2018

Choose a reason for hiding this comment

gshiba commented Aug 21, 2018

gshiba commented Aug 22, 2018

gshiba commented Aug 25, 2018

datapythonista commented Sep 23, 2018

gshiba commented Aug 21, 2018 •

edited

Loading