Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DataFrame.to_string() justification (2) #22505

Merged
merged 7 commits into from
Sep 25, 2018

Conversation

gshiba
Copy link
Contributor

@gshiba gshiba commented Aug 25, 2018

'Competes' with #22437 which attempts to revert % d to %d as suggested here: #13032 (comment) That turned out to affect a lot of tests, which in hindsight is expected; the % d has been around since at least 2012 (106fe99).

Instead, this PR reverts parts of #11942 and embraces the leading space even when index=False. df.to_string(index=False) will print the leading space when the first column is positive only, as well as preserve leading/trailing spaces on first/last lines.

With the following code:

import pandas as pd
def wrap_to_string(df, **kwargs):
    s = df.to_string(**kwargs)
    print(str(kwargs).center(25, '-'))
    for i, line in enumerate(s.split('\n')):
        print(f'^{line}$-{i}')
    print()
df = pd.DataFrame({'w': [1, 2], 'x': [3, -4], 'y': [555, 666],
                   'z': [777, -888], 'a': ['AAA', '   ']})
cols_ = list(map(list, ['wxyza', 'xyzaw', 'yzawx', 'zawxy', 'awxyz']))
for cols in cols_:
    wrap_to_string(df[cols], index=False)

Output with master:

-----{'index': False}----  # last cell (three spaces) disappeared
^w  x    y    z    a$-0
^1  3  555  777  AAA$-1
^2 -4  666 -888$-2

-----{'index': False}----  # misaligned
^x    y    z    a  w$-0
^3  555  777  AAA  1$-1
^-4  666 -888       2$-2

-----{'index': False}----  # misaligned
^y    z    a  w  x$-0
^555  777  AAA  1  3$-1
^666 -888       2 -4$-2

-----{'index': False}----  # misaligned
^z    a  w  x    y$-0
^777  AAA  1  3  555$-1
^-888       2 -4  666$-2

-----{'index': False}----  # misaligned
^a  w  x    y    z$-0
^AAA  1  3  555  777$-1
^     2 -4  666 -888$-2

Output with this PR:

-----{'index': False}----
^ w  x    y    z    a$-0
^ 1  3  555  777  AAA$-1
^ 2 -4  666 -888     $-2

-----{'index': False}----
^ x    y    z    a  w$-0
^ 3  555  777  AAA  1$-1
^-4  666 -888       2$-2

-----{'index': False}----
^   y    z    a  w  x$-0
^ 555  777  AAA  1  3$-1
^ 666 -888       2 -4$-2

-----{'index': False}----
^   z    a  w  x    y$-0
^ 777  AAA  1  3  555$-1
^-888       2 -4  666$-2

-----{'index': False}----
^   a  w  x    y    z$-0
^ AAA  1  3  555  777$-1
^      2 -4  666 -888$-2

Similar effect on Series as well.

@gfyoung gfyoung added Bug IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string labels Aug 25, 2018

for df, expected in zip(dfs, exs):
df_s = df.to_string(index=False)
assert df_s == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely should use pytest.mark.parametrize for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code style to match other tests in the same file.

@gfyoung
Copy link
Member

gfyoung commented Aug 25, 2018

cc @datapythonista

@codecov
Copy link

codecov bot commented Aug 25, 2018

Codecov Report

Merging #22505 into master will decrease coverage by 0.15%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22505      +/-   ##
==========================================
- Coverage   92.18%   92.03%   -0.16%     
==========================================
  Files         169      169              
  Lines       50820    50778      -42     
==========================================
- Hits        46850    46735     -115     
- Misses       3970     4043      +73
Flag Coverage Δ
#multiple 90.44% <100%> (-0.16%) ⬇️
#single 42.22% <0%> (-0.16%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 98.35% <100%> (-0.01%) ⬇️
pandas/io/formats/console.py 65.15% <0%> (-10.61%) ⬇️
pandas/errors/__init__.py 92.3% <0%> (-7.7%) ⬇️
pandas/core/dtypes/base.py 92.68% <0%> (-7.32%) ⬇️
pandas/core/arrays/base.py 88% <0%> (-6.25%) ⬇️
pandas/io/html.py 89.17% <0%> (-2.08%) ⬇️
pandas/io/parquet.py 71.79% <0%> (-1.94%) ⬇️
pandas/io/formats/html.py 88.81% <0%> (-1.87%) ⬇️
pandas/core/apply.py 96.75% <0%> (-1.86%) ⬇️
pandas/core/arrays/datetimelike.py 94.02% <0%> (-1.52%) ⬇️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64b88e8...cc86cd7. Read the comment docs.

@datapythonista
Copy link
Member

lgtm, but I think the changelog needs to be moved to 0.24.0.

@jreback can you take a look and see if you're happy with this?

@jreback jreback added this to the 0.24.0 milestone Sep 23, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change looks ok

@@ -49,3 +49,5 @@ Bug Fixes
**I/O**

- Bug in :func:`read_csv` that caused it to raise ``OverflowError`` when trying to use 'inf' as ``na_value`` with integer index column (:issue:`17128`)
- Bug in :func:`to_string(index=False)` that broke column alignment (:issue:`16839`, :issue:`13032`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.24.0

can you make this more explicit, e.g. say what cases it is fixing.

assert df_s == expected

def test_to_string_line_width_no_index(self):
df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})

df_s = df.to_string(line_width=1, index=False)
expected = "x \\\n1 \n2 \n3 \n\ny \n4 \n5 \n6"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment where the issues that are closed

@pep8speaks
Copy link

Hello @gshiba! Thanks for updating the PR.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the fix @gshiba

@jreback jreback merged commit 30b942a into pandas-dev:master Sep 25, 2018
@jreback
Copy link
Contributor

jreback commented Sep 25, 2018

thanks @gshiba

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_string formatters not as expected when header=False Justification is broken with to_string(index=False)
5 participants