This is a gdb-like debugger focusing on Python bytecode. So far as I know, this is the only debugger available specifically for Python bytecode.
However to do this, you need to use underneath x-python: a Python Interpreter written in Python.
This project builds off of a previous Python 3 debugger called trepan3k.
Below we'll try to break down what's going on above.
We'll invoke the a Greatest Common Divisors program gcd.py
using our debugger. The source is found in
test/example/gcd.py.
In this section we'll these some interesting debugger commands that are not common in Python debuggers:
stepi
to step a bytecode instructionset loglevel
to show a the x-python log "info"-level log tracing.info stack
to show the current stack frame evaluation stack
$ trepan-xpy test/example/gcd.py 3 5 Running x-python test/example/gcd.py with ('3', '5') (test/example/gcd.py:10): <module> -> 2 """ (trepan-xpy)
Above we are stopped before we have even run the first instruction. The ->
icon before 2
means we are stopped calling a new frame.
(trepan-xpy) step (test/example/gcd.py:2): <module> -- 2 """Greatest Common Divisor"""
@ 0: LOAD_CONST 'Greatest Common Divisor'
Ok, now we are stopped before the first instruction LOAD_CONST
which will load a constant onto the evaluation stack. The icon changed from -> 2
to -- 2
which indicates we are on a line-number boundary at line 2.
The Python construct we are about to perform is setting the program's docstring. Let's see how that is implemented.
First we see that the variable __doc__
which will eventually hold the docstring isn't set:
We see here that the first part is loading this constant onto an evaluation stack.
At this point, to better see the execution progress we'll issue the command set loglevel
which will show the instructions as we step along.
Like trepan3k, trepan-xpy has extensive nicely formatted help right in the debugger. Let's get the help for the set loglevel
command:
(trepan-xpy) help set loglevel set loglevel [ on | off | debug | info ] Show loglevel PyVM logger messages. Initially logtracing is off. However running set loglevel will turn it on and set the log level to debug. So it's the same thing as set loglevel debug. If you want the less verbose messages, use info. And to turn off, (except critical errors), use off. Examples: set loglevel # turns x-python on info logging messages set loglevel info # same as above set loglevel debug # turn on info and debug logging messages set loglevel off # turn off all logging messages except critical ones
So now lets's set that:
(trepan-xpy) set loglevel (trepan-xpy)
A rather unique command that you won't find in most Python debuggers
but is in low-level debuggers is stepi
which steps and instruction.
Let's use that:
(trepan-xpy) stepi (test/example/gcd.py:2 @2): <module> .. 2 """Greatest Common Divisor"""
@ 2: STORE_NAME 'Greatest Common Divisor') __doc__
The ..
at the beginning indicates that we are on an instruction which
is in between lines.
We've now loaded the docstring onto the evaluation stack with
LOAD_CONST
Let's see the evaluation stack with info stack
:
(trepan-xpy) info stack
0: <class 'str'> 'Greatest Common Divisor'
Here we have pushed the docstring for the program but haven't yet
stored that in __doc__
. To see this, can use the auto-eval feature of
trepan-xpy
: it will automatically evaluate strings it doesn't
recognize as a debugger command:
(trepan-xpy) __doc__ is None True
Let's step the remaining instruction, STORE_NAME
to complete the
instructions making up line 1.
trepan-xpy) stepi INFO:xpython.vm:L. 10 @ 4: LOAD_CONST 0 (test/example/gcd.py:10 @4): <module> -- 10 import sys
@ 4: LOAD_CONST 0
The leading --
before 10 import
... indicates we are on a line
boundary now. Let's see the stack now that we have run STORE_NAME
:
(trepan-xpy) info stack Evaluation stack is empty
And to see that we've stored this in __doc__
we can run eval
to see
its value:
(trepan-xpy) eval __doc__ "Greatest Common Divisor"
(Entering just _doc_
is the same thing as eval __doc__
when
auto-evaluation is on.
Now let's step a statement (not instructions), to see how a module becomes visable.
(trepan-xpy) step INFO:xpython.vm: @ 6: LOAD_CONST None INFO:xpython.vm: @ 8: IMPORT_NAME (0, None) sys INFO:xpython.vm: @ 10: STORE_NAME (<module 'sys' (built-in)>) INFO:xpython.vm:L. 12 @ 12: LOAD_CONST <code object check_args at 0x7f2a0a286f60, file "test/example/gcd.py", line 12> (test/example/gcd.py:12 @12): <module> -- 12 def check_args():
@ 12: LOAD_CONST <code object check_args at 0...est/example/gcd.py", line 12>
The INFO
are initiated by the VM interpreter. As a result of the set loglevel
the interpreters logger
log level was increased. This in turn causes a callback is made to a formatting routine provided by the debugger to nicly colorize the information. And that is why parts of this are colorized in a terminal session. In x-python
you can get the same information, just not colorized.
One thing to note is the value after the operand and in parenthesis, like after STORE NAME
. Compare that line with what you'll see from a static disassembly like Python's dis
or xdis
version of that:
10 STORE_NAME 1 (sys)
In a static disassembler, the "1" indicates the name index in the code object. The value in parenthesis is what that name, here at index 1 is, namely sys
.
In trepan-xpy
and x-python
however we omit the name index, 1, since that isn't of much interest. Instead we show that dynamic stack entries or operands that STORE_NAME
is going to work on. In particular the object that is going to be stored in variable sys
is the built-in module sys
.
Now let's step another statement to see how a function becomes available:
trepan-xpy) step INFO:xpython.vm: @ 14: LOAD_CONST 'check_args' INFO:xpython.vm: @ 16: MAKE_FUNCTION (check_args) Neither defaults, keyword-only args, annotations, nor closures INFO:xpython.vm: @ 18: STORE_NAME (<Function check_args at 0x7fdb1d4d49f0>) check_args INFO:xpython.vm:L. 25 @ 20: LOAD_CONST <code object gcd at 0x7fdb1d55fed0, file "test/example/gcd.py", line 25> (test/example/gcd.py:25 @20): <module> -- 25 def gcd(a,b):
@ 20: LOAD_CONST <code object gcd at 0x7fdb1d...est/example/gcd.py", line 25>
A difference between a dynamic language like Python and a statically compiled language like C, or Java is that there is no linking step in the complation; modules and functions are imported or created and linked as part of the execution of the code.
Notice again what's in the parenthesis after the opcode and how that differs from a static disassembly. For comparison here is what 2nd and 3rd instruction look like from pydisasm
:
16 MAKE_FUNCTION 0 (Neither defaults, keyword-only args, annotations, nor closures)
18 STORE_NAME 2 (check_args)
Again, indices into a name table are dropped and in their place are the evaluation stack items. For MAKE_FUNCTION
the name of the function that is created is shown; while for STORE_NAME
, as before, the item that gets stored (a function object) is shown.
The rest of the screencast shows that in addition to the step
(step into) and stepi
(step instruction) debugger commands there is a next
or step over debugger command, and a slightly buggy finish
(step out) command
I don't have breakpoints hooked in yet.
But in contrast to any other Python debugger I know about, we can cause an immediate return with a value and that is shown in the screencast.
We've only show a few of the many debugger features.
Here are some interesting commands not typically found in Python debuggers, like pdb
info blocks
lets you see the block stackset pc <offset>
lets you set the Program counter within the frameset autopc
runsinfo pc
to show the debugged program's program counter before each time the debugger's command-loop REPL is run.set autostack
runsinfo stack
to show the debugged program's evaluation stack before each time the debugger's command-loop REPL is run.vmstack {peek | push, pop}
- inspects or modifies evaluation stack
- xpython : CPython written in Python
- trepan3k : trepan debugger for Python 3.x and its extensive documentation.