Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger starts to paginate memory when getting the representation of big byte arrays #242

Closed
jelling opened this issue May 13, 2020 · 22 comments
Assignees

Comments

@jelling
Copy link

jelling commented May 13, 2020

Environment data

  • VS Code version: Version: 1.45.0
  • Extension version (available under the Extensions sidebar): 2020.5.78807
  • OS and version: MacOS Catalina 10.15.3
  • Python version (& distribution if applicable, e.g. Anaconda): Python 3.7.7 64 bit
  • Type of virtual environment used (N/A | venv | virtualenv | conda | ...): pyenv
  • Jedi or Language Server? (i.e. what is "python.jediEnabled" set to; more info How to update the language server to the latest stable version vscode-python#3977): jedi
  • Value of the python.languageServer setting: Microsoft

Expected behaviour

Show the local variables in debug

Actual behaviour

I can attach to a Flask web server - meaning that it stops on the line - but no local variables are displayed.

image

Steps to reproduce:

Launch debugger with either one of the below configs; neither works.


    "version": "0.2.0",
    "configurations": [
        
        {
            "name": "Flask Debug",
            "type": "python",
            "request": "launch",
            "module": "flask",
            "env": {
                "FLASK_APP": "app.py",
                "FLASK_ENV": "development",
                "FLASK_DEBUG": "0"
            },
            "args": [
                "run",
                "--no-debugger",
                "--no-reload"
            ],
            "jinja": true,
            "cwd": "${workspaceFolder}/server"
        },
        {
            "name": "Foo",
            "type": "python",
            "request": "launch",
            "module": "flask",
            "env": {
                "FLASK_APP": "app",
                "FLASK_DEBUG": "1",
                "FLASK_ENV": "development"
            },
            "args": [
                "run",
            ],
            "subProcess": true,
            "jinja": true,
            "cwd": "${workspaceFolder}/server"
        }
    ]
}
@karthiknadig karthiknadig transferred this issue from microsoft/vscode-python May 13, 2020
@int19h
Copy link
Contributor

int19h commented May 13, 2020

Which variables would you expect to see at this point? We don't show variables that are unassigned, which is going to be all of the locals (except for the function arguments) if you set a breakpoint on the first line of the function. As you step through the lines, you should be seeing variables show up as they get assigned.

@jelling
Copy link
Author

jelling commented May 13, 2020 via email

@fabioz
Copy link
Collaborator

fabioz commented May 13, 2020

Given that foo is a global variable and not a local variable, it shouldn't be visible at that point (only local variables are shown -- as a note, in the latest version we're showing globals too, but this is pretty recent and requires at least v1.0.0b8, which is probably not what you're using there).

@int19h
Copy link
Contributor

int19h commented May 13, 2020

foo is a global variable, not local.

You're right though, it should be showing up under "Globals". I can't get a trivial repro, though; e.g. this code does work for me, showing z under Globals, and also allowing it to be used in Watch:

z = 123

def foo():
    def bar():
        x = 1  # breakpoint here
        yield x
        y = z
        yield y
    return list(bar())

foo()

The fact that it also doesn't show in Watch might be a clue, though. Normally we just send the whole thing over to Python for evaluation, and if it fails, print the error message from the exception. But "not available" is something else - I've only seen it before when there's no active debug session.

@int19h
Copy link
Contributor

int19h commented May 13, 2020

@fabioz Extension version 2020.5.78807 ships debugpy 1.0.0b9, so this should be working.

@fabioz
Copy link
Collaborator

fabioz commented May 13, 2020

In that case, @jelling, can you provide the logs for the run?

i.e.:

  • Open VS Code
  • Select the command Extensions: Open Extensions Folder
  • Locate the Python extension directory, typically of the form ms-python.python-2020..***
  • In that directory ensure you do not have any debug*.log files, if you do, please delete them
  • Go back into VS Code and modify your launch.json to add the setting "logToFile": true, see below:
"version": "0.2.0",
"configurations": [
    {
        "name": "Python: Current File (Integrated Terminal)",
        "type": "python",
        "request": "launch",
        "program": "${file}",
        "stopOnEntry": true,
        "console": "integratedTerminal",
        "logToFile": true
    },
  • Start debugging
  • When done, go back into the extension directory and upload the debug*.log files into this GitHub issue.

@int19h
Copy link
Contributor

int19h commented May 13, 2020

Just in case, can you also do import debugpy; print(debugpy.__version__), to make sure that it's really picking up the correct version.

@jelling
Copy link
Author

jelling commented May 13, 2020

Will add logs momentarily but first here's a screenshot showing that the f variable is not shown by debugger even after assignment.

image

@jelling
Copy link
Author

jelling commented May 13, 2020

Debug logs:

Archive.zip

@int19h
Copy link
Contributor

int19h commented May 13, 2020

@fabioz Note the Call Stack pane - the generator is being iterated on a background thread, and it's the first user code frame on that thread (or possibly even the first Python code frame). Perhaps this has something to do with it?

@int19h
Copy link
Contributor

int19h commented May 13, 2020

I think I see part of the problem. When it breaks, the first thing that VSCode does is evaluate foo, since it's in Watch. But the server never gets back with the result of that evaluation, which blocks all subsequent messages from getting processed. The IDE sends request to fetch scopes right after, but it never receives the reply, so it never even gets to querying variables.

@jelling, what is the type of foo, and what happens if you do something like print(repr(foo))?

@jelling
Copy link
Author

jelling commented May 13, 2020

Python debug version looks correct:

Debugpy.__version__ = 1.0.0b9

And f/foo is bytes:

type(f)
<class 'bytes'>

@int19h
Copy link
Contributor

int19h commented May 13, 2020

We use repr() to produce the values that you're seeing in Variables, hence why I'm curious about that one. Could it just be a very large bytestring, causing repr() to spin for a very long time trying to stringify it?

@jelling
Copy link
Author

jelling commented May 14, 2020

@int19h It's a 5 minute video file so yeah that is likely it.

@fabioz
Copy link
Collaborator

fabioz commented May 14, 2020

@jelling, can you provide the len(f)?

I thought we already handled such cases, but maybe we don't deal well enough in really extreme cases.

@jelling
Copy link
Author

jelling commented May 14, 2020

Here it is:

len(f) = 20730470400

@fabioz
Copy link
Collaborator

fabioz commented May 14, 2020

It seems that's really the issue (I was able to reproduce it here).

-- I think that when you're converting that 20GB byte array in your machine it's getting slow due to the amount of RAM required -- in my machine, with 32GB of RAM doing a repr on a 10GB byte array worked in 40 seconds but with the 20GB array I had to stop it when the OS started to paginate.

So, we need better support in the debugger when really big byte arrays are in memory (I think we usually don't get there because usually users would be using numpy arrays for such extreme use cases, but we should definitely support this use case too).

I'll take a look at it.

@fabioz fabioz self-assigned this May 14, 2020
@fabioz fabioz changed the title Debugger doesn't show variables Debugger starts to paginate memory when getting the representation of big byte arrays May 14, 2020
@int19h
Copy link
Contributor

int19h commented May 14, 2020

We have some logic to avoid repr() for large strings, but I don't remember if we consistently apply this to bytes and bytearray.

@int19h
Copy link
Contributor

int19h commented May 14, 2020

@fabioz Ah, I think I see the problem - we don't actually avoid repr() for strings and bytes, we just trim it afterwards to fit the limit. I think we need some extra checks in SafeRepr._repr_str() to bail out early in this case, unless raw value is requested. And produce an output similar to what we do for overly long iterables, i.e.: <bytes, len() = ...>.

@fabioz
Copy link
Collaborator

fabioz commented May 14, 2020

Yeap... I'm checking it (or would you like to take a look at that since that code was inherited from ptvsd?)

As for dealing with raw values, I'm not sure the best way to proceed. I definitely don't want to send a 20GB string over the wire nor even decode the bytes even at that case (the server would probably just halt anyways... we may have bigger limits for when raw is requested, but I think we should have some higher limit even in that case -- maybe something like 100MB -- what do you think @int19h ?).

@int19h
Copy link
Contributor

int19h commented May 14, 2020

If you're already looking into this, go ahead. I just took a quick peek at the code.

There's some functionality in DAP around memory references that we never tapped into, that might perhaps be relevant here - it essentially provides random access to the client without ever retrieving the complete value. I don't know how this actually looks in UI in VSCode, or whether VS supports it, but we should check. If this is wired up in VS, maybe we can stop reporting them as raw values altogether?

If not, then I guess we'll have to limit.

@fabioz
Copy link
Collaborator

fabioz commented May 15, 2020

Note: I'm still working on this (I'm dealing with some corner cases on encodings on python 2 with the different approach -- I think I got it properly already, but I'm still in the process of updating related tests).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants