Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug report: printed dataframes do not collapse in debugger #889

Closed
Bartdoekemeijer opened this issue Apr 5, 2022 · 7 comments
Closed

Bug report: printed dataframes do not collapse in debugger #889

Bartdoekemeijer opened this issue Apr 5, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@Bartdoekemeijer
Copy link

Bartdoekemeijer commented Apr 5, 2022

Description of the bug
When I try to print a Pandas Dataframe in the debug console with v2022.4.0, it prints the first 150 and last 150 rows and all the columns of the dataframe. This is very slow and makes it pretty much impossible for me to actually see the DataFrame contents. It should be showing the first and last 5 rows and a small number of columns.

How to reproduce this bug

import pandas as pd
import numpy as np

col_names = ["col_{:03d}".format(i) for i in range(100)]
df = pd.DataFrame(dict(zip(col_names, 100 * [np.zeros(1000)])))
<start debugger here>

image
(it prints the first and last 150 rows, and all columns).

Note that outside of the debug console, e.g., print(df) works just fine and prints a collapsed DataFrame, as expected:
image

What would I expect
The debugger should collapse the Pandas DataFrame, as:
image

This is what I am seeing when using print(df) in the code. This is also the behavior I am seeing in the debug console with version v2022.2.1924087327.

What have I tried so far

  • Downgrading to debugpy==1.5.1 and debugpy==1.5.0.
  • Different pandas versions.
  • Resetting the Pandas settings for visualizing dataframes, using pd.reset_option('display.max_rows').
  • Creating a new Python virtual environment with a different Python version.
  • Reinstalling VSCode.
  • Reinstalling the Python plugin.

None of this resolved the issue. The only thing that resolved it is downgrading vscode-python to v2022.2.1924087327. Hence, I have posted this bug here rather than in the debugpy repository.

@karthiknadig karthiknadig transferred this issue from microsoft/vscode-python Apr 6, 2022
@int19h
Copy link
Contributor

int19h commented Apr 6, 2022

Note that print(df) is not the same as df - the latter is basically a shorthand for print(repr(df)), same as it is in the standard Python REPL. And, conversely, you can do print(df) in the Debug Console. Does that make a difference in the output for you?

@Bartdoekemeijer
Copy link
Author

Thank you @int19h!

In the debugger console:

  • print(repr(df)) works as intended -- it shows the collapsed dataframe.
  • print(df) also works as intended -- it shows the collapsed dataframe.
  • df does not work as intended. It shows the expanded dataframe (all columns, first and last 150 rows)

@int19h
Copy link
Contributor

int19h commented Apr 6, 2022

What happens if you open the standard Python REPL (i.e. just run python without arguments) and type df there? The debug console is supposed to behave in the same manner for consistency, so if the output matches, everything is as intended.

@Bartdoekemeijer
Copy link
Author

That is showing me a collapsed dataframe, as intended,

Python 3.9.5 (default, Nov 23 2021, 15:27:38)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>>
>>> col_names = ["col_{:03d}".format(i) for i in range(100)]
>>> df = pd.DataFrame(dict(zip(col_names, 100 * [np.zeros(1000)])))
>>> df
     col_000  col_001  col_002  col_003  col_004  col_005  ...  col_094  col_095  col_096  col_097  col_098  col_099
0        0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
1        0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
2        0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
3        0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
4        0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
..       ...      ...      ...      ...      ...      ...  ...      ...      ...      ...      ...      ...      ...
995      0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
996      0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
997      0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
998      0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0
999      0.0      0.0      0.0      0.0      0.0      0.0  ...      0.0      0.0      0.0      0.0      0.0      0.0

[1000 rows x 100 columns]

@int19h
Copy link
Contributor

int19h commented Apr 6, 2022

Very interesting, thank you! @fabioz, is this the case of safe-repr code not being on par with the stock pandas repr?

@fabioz
Copy link
Collaborator

fabioz commented Apr 15, 2022

What happens is that we now customize pandas ourselves and our limits are (much) bigger than the default pandas values (I did those when checking a case where the user had limitless limits but it seems I didn't check that the defaults are in general based on the size of the terminal, which is definitely better).

i.e.: as a part of fixing #695 we now customize pandas with:

PANDAS_MAX_ROWS = as_int_in_env('PYDEVD_PANDAS_MAX_ROWS', 300)
PANDAS_MAX_COLS = as_int_in_env('PYDEVD_PANDAS_MAX_COLS', 300)
PANDAS_MAX_COLWIDTH = as_int_in_env('PYDEVD_PANDAS_MAX_COLWIDTH', 80)

I guess this should be changed again so that if the setting that the user has configured is lower than that we use the lower one.

I'll do the update.

In the meanwhile setting an environment such as:

      "env": {
        "PYDEVD_PANDAS_MAX_ROWS": "10",
        "PYDEVD_PANDAS_MAX_COLS": "10",
        "PYDEVD_PANDAS_MAX_COLWIDTH": "50"
      }

Should make things closer to the default config.

fabioz added a commit to fabioz/debugpy that referenced this issue Apr 15, 2022
@Bartdoekemeijer
Copy link
Author

Thank you for the clarification, @fabioz! Your suggestion resolved my issue.

@fabioz fabioz closed this as completed in 7be5993 Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants