Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DataFrame.to_string() justification #22437

Closed
wants to merge 4 commits into from

Conversation

gshiba
Copy link
Contributor

@gshiba gshiba commented Aug 21, 2018

Fixes justification with DataFrame.to_string(index=False), but breaks other tests :-(

There are two side-effects of this PR, which will likely make it unacceptable, but wanted to get suggestions if possible.

  1. It changes the expected behavior with trailing spaces as shown below.
import pandas as pd
def wrap_to_string(df, **kwargs):
    s = df.to_string(**kwargs)
    print(str(kwargs).center(40, '='))
    for i, line in enumerate(s.split('\n')):
        print(f'^{line}$-{i}')
    print('~' * 40)
df = pd.DataFrame({'x': [11, -22], 'y': ['aaa', ' ']})
wrap_to_string(df)
wrap_to_string(df, line_width=1)
wrap_to_string(df, index=False)
wrap_to_string(df, index=False, line_width=1)

Current output with master

==================={}===================  # Looks good
^    x    y$-0
^0  11  aaa$-1
^1 -22     $-2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==========={'line_width': 1}============  # Looks good (to me anyways) but has trailing spaces
^    x  \$-0
^0  11   $-1
^1 -22   $-2
^$-3
^     y  $-4
^0  aaa  $-5
^1       $-6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
============{'index': False}============  # broken justification (this is what I want to fix)
^x    y$-0
^11  aaa$-1
^-22$-2                                   # the last value (space) disappeared completely
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==={'index': False, 'line_width': 1}====  # still broken
^x  \$-0
^11   $-1
^-22   $-2
^$-3
^  y  $-4
^aaa$-5                                   # the last value (space) disappeared completely
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Output with this PR

==================={}===================  # No change from master
^    x    y$-0
^0  11  aaa$-1
^1 -22     $-2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==========={'line_width': 1}============  # No change from master
^    x  \$-0
^0  11   $-1
^1 -22   $-2
^$-3
^     y  $-4
^0  aaa  $-5
^1       $-6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
============{'index': False}============  # Justification is fixed
^  x    y$-0
^ 11  aaa$-1
^-22     $-2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
==={'index': False, 'line_width': 1}====  # Justification is fixed
^  x  \$-0
^ 11   $-1
^-22   $-2
^$-3
^  y  $-4
^aaa  $-5
^     $-6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1. It breaks a bunch of other tests that use
    IntArrayFormatter._format_strings because the formatting changed from '{x: d}' which adds a leading space for positive values, to '{x:d}'. My impression is that this PR eliminates an arguably unnecessary extra space for positive integers, but I haven't looked at those tests in too much detail (eg I've never used to_latex).

The tests that break and an example is shown below.

pandas/tests/arrays/categorical/test_repr.py::TestCategoricalRepr::test_print_none_width FAILED                                                                                                                  [  4%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_east_asian_unicode_frame FAILED                                                                                                            [  8%] 
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_datetimelike_frame FAILED                                                                                                                  [ 12%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_to_string_no_header FAILED                                                                                                                 [ 16%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_to_string_specified_header FAILED                                                                                                          [ 20%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_to_string_index_formatter FAILED                                                                                                           [ 25%]
pandas/tests/io/formats/test_format.py::TestDataFrameFormatting::test_to_string_line_width FAILED                                                                                                                [ 29%] 
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_east_asian_unicode_series FAILED                                                                                                              [ 33%] 
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_period FAILED                                                                                                                                 [ 37%]
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_truncate_ndots FAILED                                                                                                                         [ 41%] 
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_to_string_name FAILED                                                                                                                         [ 45%] 
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_to_string_dtype FAILED                                                                                                                        [ 50%]
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_to_string_length FAILED                                                                                                                       [ 54%]
pandas/tests/io/formats/test_format.py::TestSeriesFormatting::test_to_string_header FAILED                                                                                                                       [ 58%]
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex FAILED                                                                                                                                      [ 62%] 
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex_multiindex FAILED                                                                                                                           [ 66%] 
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex_multicolumnrow FAILED                                                                                                                       [ 70%] 
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex_longtable FAILED                                                                                                                            [ 75%] 
pandas/tests/io/formats/test_to_latex.py::TestToLatex::test_to_latex_no_header FAILED                                                                                                                            [ 79%] 
pandas/tests/series/test_repr.py::TestSeriesRepr::test_multilevel_name_print FAILED                                                                                                                              [ 83%]
pandas/tests/series/test_repr.py::TestCategoricalRepr::test_categorical_repr FAILED                                                                                                                              [ 87%]
pandas/tests/series/test_repr.py::TestCategoricalRepr::test_categorical_series_repr FAILED                                                                                                                       [ 91%] 
pandas/tests/series/test_repr.py::TestCategoricalRepr::test_categorical_series_repr_ordered FAILED                                                                                                               [ 95%]
pandas/tests/sparse/test_format.py::TestSparseSeriesFormatting::test_sparse_int FAILED                                                                                                                           [100%] 
______________________________________________________________________________________ TestCategoricalRepr.test_print_none_width _______________________________________________________________________________________
                                                                                                                                                                                                                        
self = <pandas.tests.arrays.categorical.test_repr.TestCategoricalRepr object at 0x12892def0>                                                                                                                            
                                                                                                                                                                                                                        
    def test_print_none_width(self):                                                                                                                                                                                    
        # GH10087                                                                                                                                                                                                       
        a = Series(Categorical([1, 2, 3, 4]))                                                                                                                                                                           
        exp = u("0    1\n1    2\n2    3\n3    4\n" +                                                                                                                                                                    
                "dtype: category\nCategories (4, int64): [1, 2, 3, 4]")                                                                                                                                                 
                                                                                                                                                                                                                        
        with option_context("display.width", None):                                                                                                                                                                    
>           assert exp == repr(a)                                                                                                                                                                                      
E           AssertionError: assert '0    1\n1   ... [1, 2, 3, 4]' == '0   1\n1   2\... [1, 2, 3, 4]'                                                                                                                   
E             - 0    1                                                                                                                                                                                                 
E             ?  -                                                                                                                                                                                                      
E             + 0   1                                                                                                                                                                                                   
E             - 1    2                                                                                                                                                                                                 
E             ?  -                                                                                                                                                                                                      
E             + 1   2                                                                                                                                                                                                   
E             - 2    3                                                                                                                                                                                                 
E             ?  -                                                                                                                                                                                                     
E             + 2   3                                                                                                                                                                                                  
E             - 3    4                                                                                                                                                                                                  
E             ?  -                                                                                                                                                                                                      
E             + 3   4                                                                                                                                                                                                   
E               dtype: category                                                                                                                                                                                         
E               Categories (4, int64): [1, 2, 3, 4]                                                                                                                                                                     
                                                                                                                                                                                                                       
pandas/tests/arrays/categorical/test_repr.py:59: AssertionError

The root issue, I think, is that we have '{x: d}'.format(x=x), but the current tests don't want the leading space for a simple pd.DataFrame({'x': [0, 1], 'y': [2, 3]}).to_string(index=False).

I have another attempt at this issue, which explicitly expects the leading space, but may not be palatable to some...

Thanks.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to add tailing spaces to the output of .to_string(), not sure what's the problem, can't it be fixed? I guess the leading space is ok.

DataFrame({'x': [0.1, 0.2, -0.3], 'y': [4, 5, 6]}),
DataFrame({'x': [0.1, 0.2, -0.3], 'y': [0.4, 0.5, 0.6]}),
DataFrame({'x': [0.1, 0.2, -0.3], 'y': [0.4, 0.5, -0.6]}),
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases like this is preferred to use pytest parametrization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix if/when general direction of this PR is accepted.

@datapythonista datapythonista added Bug IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string labels Aug 21, 2018
@gshiba
Copy link
Contributor Author

gshiba commented Aug 21, 2018

I updated the original description to include current and proposed behavior.

@gshiba
Copy link
Contributor Author

gshiba commented Aug 22, 2018

I just updated one of the test modules that was failing (pandas/tests/io/formats/test_format.py) and there are 4 more modules to go.

It seems a lot of things assume/depend on the % d formatting, which makes me think that my other approach is more sensible... Please let me know what you think.


As an aside, while fixing test_format.py I think I found some (unrelated?) formatting issues...

import pandas as pd
df = pd.DataFrame(123, [0, 10], [0, 11])
print(df.to_string())
print('---')
s = pd.Series([0, 100, 200])
print(s.to_string(max_rows=2))

outputs

     0    11  # '0' is misaligned
0   123  123
10  123  123
---
0      0
    ...   # dots are misaligned
2    200

@gshiba
Copy link
Contributor Author

gshiba commented Aug 25, 2018

#22505 is probably a better fix

@datapythonista
Copy link
Member

Closing in favor of #22505. Will reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_string formatters not as expected when header=False
2 participants