Replies: 3 comments 5 replies
-
I do understand that you want to create a system where active debugging session of a single fiber would not slow down whole system, bu what about an idea [for PoC] to enable debugging mode in "single-user" mode. i.e. when I debug a script which is populating Tarantool database with the generated TPC-H data? It's single interactive session, we could even disable JIT at the moment of activating debugger/setting breakpoint, and nobody would pay any attention (because majority of time would be spent at the client side while developer would think and look around:) )
If I'd get this simplistic, single-mode scenario working today, I'd be more than happy. Yes, eventually we need to have ability to debug particular fiber of many running on live systems, but it would require great additional efforts for tuning, but we could make something useful even today. Please! |
Beta Was this translation helpful? Give feedback.
-
I suggest supporting some well-known debugger specifications for this, like gdb, if possible. This will make integration with popular IDEs easier. We have special conditions, where a yield is not allowed, like inside a transaction. How the DebugHook will deal with it? |
Beta Was this translation helpful? Give feedback.
-
Seems to be rather good. One more thing should be clarified: the creation of new fiber from within a debugged fiber. |
Beta Was this translation helpful? Give feedback.
-
Inter-fiber Debugger for Tarantool
Igor Munkin @igormunkin imun@tarantool.org
Rationale
To make Tarantool platform developer-friendly we should provide a set of basic developer tools. One of such tools is debugger. There are number of debuggers available for the Lua environments, although all of them are missing the
critical feature needed for the Tarantool platform: they should not cause a full-stop of the debugged program during the debug session.
In this RFC I propose to overcome the problem with a solution that will stop only the fiber to be debugged. It will allow developers to debug their application, while Tarantool can keep processing requests, perform replication and so on.
Approach
To do not reinvent the debugger techniques we may borrow the already existent Lua debugger, put the rules about fiber use, data manipulation tweaks and so on.
Every fiber can be considered as a "debuggee" or a regular fiber, switching from one state to the other. To control the status we can either patch fiber machinery - which seems excessive as fibers can serve pure C tasks - or tweak
the breakpoint hook to employ the fiber yield. The fiber will appear in a state it waits for commands from the debugger and set the LuaJIT machinery hooks to be prepared for the next fiber to be scheduled.
Debug techniques
Regular debuggers provide interruption for all threads at once hence they don't distinguish breakpoints appearance across the threads - they just stop execution. For our case we have to introduce some specifics so that debugger
will align with the fiber nature of the server behavior. Let's consider some techniques we can propose to the user.
1) Break first fiber met
User puts a breakpoint that triggers once, stopping the first fiber the break happens in. After breakpoint is met the fiber reports its status to the debugger server, put itself in a wait state, clears the breakpoint and yields. As soon as server issue a command, the debuggee will reset the breakpoint, handle the command and proceed with execution or yield again.
2) Regular breakpoint
This mode will start the same way as previous mode, but keep the breakpoint before yield, so that the breakpoint still can trigger in another fiber. As the server may deliver huge number of fibers during its performance, we have to set
up a user-configurable limit for the number of debuggee fibers can be set at once. As soon as limit is reached the debuggee fiber starts behave exactly as in previous mode, clearing the breakpoint before the yield from the debuggee.
3) Run a function under debug session
This is the most straightforward way to debug a function: perform a call through the debug interface. A new fiber will be created and break will appear at the function entrance. The limit of debuggee fibers should be increased and the fiber will behave similar to the modes above.
4) Attach debugger to a fiber by ID
Every fiber has its numerical ID, so debugger can provide interface to start debugging for a particular fiber. The fiber will be put in a wait state as soon as it starts execution after the debugger is attached.
Basic mechanism
The Tarantool side of the debugger will consist of a dedicated fiber named
DebugSRV
that will handle requests from the developer and make bookkeeping of debuggee fibers and their breakpoints and a Lua functionDebugHook
is set as a hook in Lua debug library. Users should not use this hook for the period of debugging to avoid interference. The external interface can be organized over arbitrary protocol, be it a socket connection, console or even IPROTO (usingIPROTO_CALL
).Debuggee fiber will be controlled by a debug hook function named
DebugHook
. It is responsibility of theDebugHook
to set the debuggee fiber status, check the breakpoints appearance, its condition including the ignore count and updatehit_count
. As soon as breakpoint is met, theDebugHook
has to put its state to pending and wait for command from theDebugSRV
.Communication between
DebugSRV
and the debuggee fiber can be done viafiber.channel
mechanism. It will simplify the wait-for semantics.All interfaces for debugging activities are enclosed in
tdb
built-in module.Data structure
Every debuggee fiber is present in the corresponding table in the
DebugSRV
fiber. The table has the following format:As
DebugSRV
receives commands it updates the structure of the debuggees and forces the fiber wakeup to reset it's hook state. The state of the debuggee is one of the following:'operation'
: the fiber is already in thedebuggees.fibers
list, but it issued yield without any breakpoint met'pending'
:DebugHook
waits for a new command from the channel in thedebuggees.fibers
of its own IDDebugHook behavior
For the techniques 3) and 4) fiber appears in the list of
debuggees.fibers
first, with its status set as'operation'
with a list of breakpoints set.For the techniques 1) and 2) there is a list of
global_breakpoints
that should be checked by every fiber.In case a fiber receives control from the debug machinery and its ID matches
DebugSRV
one it should leave the hook immediately. Otherwise it should check if it is present indebuggees.fibers
. If it is - it should check if itscurrent position meets any breakpoint from the
debuggees.fibers[<fiber_id>].breakpoints
ordebuggees.global_breakponts
. If breakpoint is met, the fiber sets its state into'pending'
and waits for acommand from the
debuggees.fibers[<fiber_id>].channel
.In case a fiber is neither
DebugSRV
one nor present in thedebuggees.fibers
it should check that the number of fibers entries in thedebuggees
structure is less thanmax_debuggees
. In such a case it checks if it met any of theglobal_breakpoints
and put itself into the fibers list, updating the array size. Also it should open a channel to theDebugSRV
and put itself into the'pending'
state.DebugSRV behavior
DebugSRV handles the input from the user and supports the following list of commands (as mentioned, it can be used from any interface, so commands are function calls for general case):
tdb.break.info([<fiber_id>])
- list all breakpoints with counts and conditions, limits output for the fiber with<fiber_id>
tdb.break.cond(<breakpoint_id>, <condition>)
- set a condition for the breakpoint, condition should be Lua code evaluating into a boolean valuetdb.break.ignore(<breakpoint_id>, <count>)
- ignore the number of breakpoint executionstdb.break.delete(<breakpoint_id>)
- removes a breakpointtdb.step(<fiber_id>)
- continue execution, stepping into the calltdb.stepover(<fiber_id>)
- continue execution until the next source line, skip callstdb.stepout(<fiber_id>)
- continue execution until return from the current functiontdb.continue(<fiber_id>)
- continue execution until next breakpoint is met or debug session is stoppedThe functions above are common for many debuggers, just some tweaks to adopt fibers. Functions below are more specific, so let's get into some details:
tdb.set_max_debuggees(<number>)
- set the number of fibers can be debugged simultaneously. It modifies thedebuggees.max_debuggees
so that new fibers will respect the amount of debuggees. For example, if at some point ofdebugging there were 5 debuggee fibers user can set this value to 3 - it will not cause any problem, just a new fiber will not become a debuggee if it meet some global breakpoint.
tdb.eval(<fiber_id>, <code>)
- allows to evaluate the code in the context of the debuggee fiber if it is in'pending'
mode. User can issue adebug_eval(113, function() return fiber.id() end)
to receive 113 as a resulttdb.break.create(<breakpoint description>, [<fiber_id>])
- add a new breakpoint in the fiber's breakpoint list or in the global list if no fiber ID providedtdb.start()
- starts debug session: creates debuggees structure, preserve current debug hook indebuggees.preserved_hook
and setsDebugHook
as the current hooktdb.stop()
- quits debug session: resets the debug hook, clears debuggees structureBeta Was this translation helpful? Give feedback.
All reactions