-
Notifications
You must be signed in to change notification settings - Fork 624
Adding an SQL Operator
The current architecture of Peloton contains a mix of both PostgreSQL components and our own components. In particular, we use the SQL parser
and planner
of Postgres, and then we use our own execution engine
to execute these generated plans. We also use our own storage engine
to store the databases. More information on the tile-based architecture of our execution and storage engines is available here.
Before going about adding an operator, you might want to look at existing Peloton operators in src/executor
. In particular, the limit operator is kind of straightforward.
All the operators inherit from the abstract operator class. In particular, each operator has a Init
and Execute
functions. These functions should initialize/reinitialize and execute the respective operator.
When a parent operator invokes the Execute
function of a child operator, the child operator returns false
if and only if it has already returned all the logical tiles it has produced to the parent operator. It will never return an empty logical tile. Otherwise, the child operator returns true
, and the parent operator can use GetOutput
to obtain the logical tile produced by the child operator. A parent operator can, therefore, repeatedly invoke Execute
function of a child operator to obtain all the logical tiles produced by the child operator.
We intercept the generated plan trees
generated by the Postgres planner
component and use our own executor
component. Now, we will get into the specifics within our side of things.
Queries are classified into data description language
(DDL) queries and data manipulation language
(DML) queries.
These two categories of queries take two different processing paths both within the Postgres frontend and Peloton.
In Postgres, DML queries are executed in four stages. Take a look at the entry point of the Postgres executor
module here.
ExecutorStart()
performs some initialization that sets up the dynamic plan state
tree from the static plan tree
.
ExecutorRun()
invokes the plan state tree.
ExecutorFinish()
and ExecutorEnd()
take care of cleaning things up, but they are not relevant to us. Peloton takes over query execution when queries reach ExecutorRun()
, and we therefore only make use of ExecutorStart()
in our system.
In case of DDL queries, Peloton intercepts them in the ProcessUtility
function here.
Peloton cannot directly execute the Postgres plan state tree
as our executors can only understand our own Peloton query plan tree
. So, we need to transform the Postgres plan state tree
into a Peloton plan tree
before execution.
We refer to this process as plan mapping or plan transformation. After mapping the plan, Peloton executes the plan tree
by recursively executing the plan tree nodes. We obtain Peloton tuples after query processing. We then transform them back into Postgres tuples before sending them back to the client via the Postgres frontend.
After taking over from Postgres, DDL queries are handled by peloton_ddl()
, whereas DML queries would be processed by peloton_dml()
. These functions are located here within the peloton module.
Plan mapping is done only for DML queries, since DDL queries do not require any planning. The high-level idea is to map each plan node in the Postgres plan state tree recursively into a corresponding plan node in the Peloton plan tree. The plan mapper
module preprocesses the plan state tree, and extracts the critical information from each Postgres plan node. This preprocessing is performed by functions in the peloton::bridge::DMLUtils
namespace. The main PlanTransformer
would then transform the preprocessed plan by recursively invoking sub-transformers based on the type of node in the tree. An entry point for this module is peloton::bridge::PlanTransformer::TransformPlan()
.
Peloton then builds an executor tree based on the Peloton query plan tree. It then runs the executor tree recursively.
Execution context
is the state associated with an instance of the plan
execution, such as parameters
and transaction
information. By separating the execution context from the query plan, we can support prepared statements
. A planned and then mapped query plan
can be reused with different execution contexts. This saves time spent for query planning and mapping.
After that, query execution consists of two stages. The execution tree has to be initialized (DInit()
), and then it is executed (DExecute()
). An entry point for this module is here.
We have our own expression system in Peloton. We transform the Postgres expressions into Peloton expressions, and evaluate them. All the expressions are based on the abstract expression class.
The code related to our expression system is located under src/backend/expression
. There are several file containing utility functions like this one containing date-related functions.
The expression system is tightly coupled with our type system. The type system is based on an abstract data type called Value
. The associated code is located here.
Adding an operator will involve working with our plan mapper
, execution engine
, and the expression system
.
It will probably involve changes in the operator to plan transformer, the plan executor, and the execution engine.
In this paragraph, we show how to support like
operator in Peloton by slightly modifying the source code. Peloton reuses Postgres' expression system and transforms Postgres expressions into Peloton expressions. The related functions are implemented in expr_transformer.cpp.
In particular, TransformExpr()
is responsible for performing the expression transformation.
As the plan node associated to the like
operator is tagged with T_OpExpr
, TransformExpr()
invokes TransformOp()
in order to search for the correct expression type in Peloton using pg_func_id
, which is the operator's unique ID in Postgres. The mapping information is recorded in an unordered map called kPgFuncMap
, written in pg_func_map.cpp. The value of pg_func_id
for like
operator can be either 850 (char type) or 1631 (varchar type and text type), so we add the following two lines in pg_func_map.cpp:
{850, {EXPRESSION_TYPE_COMPARE_LIKE, 2}},
{1631, {EXPRESSION_TYPE_COMPARE_LIKE, 2}},
EXPRESSION_TYPE_COMPARE_LIKE
is the unique expression identifier for like
operator in Peloton, which must be defined in types.h.
Now we have successfully mapped the like
operator from Postgres expression to Peloton expression. The next step is to tell Peloton how to execute like
. Since like
is a comparison operator, ExpressionUtil::ComparisonFactory()
in expression_util.cpp will be called to perform the execution logic implemented for like
. Just like other comparison operators, the only thing we need to do for supporting the execution of like
is to add EXPRESSION_TYPE_COMPARE_LIKE
into the switch-case statement in ExpressionUtil::ComparisonFactory()
, and the detailed execution logic has already been implemented in the class CmpLike
.
Following the instructions described above, you can also implement any other operators on your own by adding a few lines of code to Peloton.