DBSim is an extensible database simulator in pure Python for fast prototyping in-database algorithms. It provides a simulated RDBMS environment, related tools and high extensibility & flexibility for data scientists to fast develop, verify and analyze the prototypes of in-database analytic algorithms.
Implementing any new in-database algorithm directly in real database kernels without pre-verification may not be a good idea and may cause waste of time. If you agree, try DBSim! Spending a little time on prototyping your in-db algorithms in DBSim to convince yourself before formally making them into real-world DBMS.
- Including all the major components of a general RDBMS query engine: SQL parser, relational operators, logical and physical plan generator, query optimizer, etc.
- High extensibility and flexibility: DBSim provides APIs to extend almost any of the DB components, including but not limited to adding new keywords to query syntax, implementing custom operators, writing custom optimization rules, extending physical plan executors, etc.
- Debugging and analyzing tools: query plan visualizer, GUI, etc.
- Low learning cost for users: writing in Python makes it much easier to learn comparing with the real-world DBMS developed in C/C++/Java.
- Add more optimization rules that are commonly applied by general RDBMS
- Implement Volcano-style optimizer (currently DBSim only supports heuristic (rule-based) optimizer)
- Develop rule definition language to support simpler claims of custom optimization rules
- Integrate more performance analyzing and visualization tools
DBSim requires Python 3.6+ . To install necessary dependencies, run the following command:
pip install -r requirements.txt
python demo.py
We use some basic examples to briefly present DBsim here. Please see dbsim/examples/
and dbsim/tests/
for more detailed examples.
-
Execute a query end-to-end
(1) without query optimizer
from dbsim import dataset as ds from dbsim.tests.fixtures.demo_adapter import DemoAdapter from dbsim.query_parser import parse_statement from dbsim.query import Query dataset = ds.DataSet() dataset.add_adapter(DemoAdapter()) sql = """ SELECT musical.title, musical.year FROM (SELECT * FROM (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) WHERE animation.mid < 3000 ) WHERE musical.year > 1960 """ for row in Query(dataset, parse_statement(sql)): print(row)
(2) with query optimizer
from dbsim import dataset as ds from dbsim.tests.fixtures.demo_adapter import DemoAdapter from dbsim.query_parser import parse_statement from dbsim.query import Query from dbsim.planners import rules from dbsim.planners.heuristic.heuristic_planner import HeuristicPlanner dataset = ds.DataSet() dataset.add_adapter(DemoAdapter()) planner = HeuristicPlanner(max_limit = float('Inf')) planner.addRule(rules.FilterMergeRule()) planner.addRule(rules.FilterPushDownRule()) planner.addRule(rules.Selection_SimSelection_Swap_Rule()) sql = """ SELECT musical.title, musical.year FROM (SELECT * FROM (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) WHERE animation.mid < 3000 ) WHERE musical.year > 1960 """ for row in Query(dataset, parse_statement(sql), optimizer=planner): print(row)
-
Parse a query into abstract syntax tree(AST) and visualize the AST
from dbsim.query_parser import parse_statement from dbsim.utils.visualizer import LogicalPlanViz sql = """ SELECT musical.title, musical.year FROM musical WHERE musical.year > 1960 """ ast = parse_statement(sql) LogicalPlanViz.show(ast, view=True)
-
Manually optimize a logical plan and visualize the resulting plan
from dbsim import dataset as ds from dbsim.tests.fixtures.demo_adapter import DemoAdapter from dbsim.query_parser import parse_statement from dbsim.query import Query from dbsim.planners import rules from dbsim.planners.heuristic.heuristic_planner import HeuristicPlanner from dbsim.utils.visualizer import LogicalPlanViz dataset = ds.DataSet() dataset.add_adapter(DemoAdapter()) planner = HeuristicPlanner(max_limit = float('Inf')) planner.addRule(rules.FilterMergeRule()) planner.addRule(rules.FilterPushDownRule()) planner.addRule(rules.Selection_SimSelection_Swap_Rule()) sql = """ SELECT musical.title, musical.year FROM (SELECT * FROM (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) WHERE animation.mid < 3000 ) WHERE musical.year > 1960 """ plan = Query(dataset, parse_statement(sql)).getPlan() best_plan = planner.findBestPlan(plan) LogicalPlanViz.show(best_plan, view=True)
-
**Note: ** We already extended the standard SQL syntax in DBSim as examples to show its extensibility. Please see the documentation for further tutorials. But at this step you can quickly try such an extended-SQL-syntax query in any of the code snippets above:
sql = """ SELECT musical.title, musical.year FROM (SELECT * FROM (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) WHERE animation.embedding to [1,2,3,4] < 10 ) WHERE musical.year > 1960 """
-
More examples:
To try more examples, run the following command (DO NOT cd to
dbsim/
ordbsim/examples
, just run the command here, i.e., in the root path of this repo):python dbsim/examples/<example_filename>.py
To run the tests in
dbsim/tests/
, you can use Pytest or manually call each testing function.
In the repo root path, run following command to start the GUI:
python dbsim/gui/backend/server.py
Then open your browser and go to the link localhost:8888
. Please read our paper Extensible Database Simulator for Fast Prototyping In-Database Algorithms for more details on how to use the GUI. The GUI is only tested on Microsoft Edge, and currently we do not guarantee its compatibility with other browsers.
Note:
(1) To run a query, you need to let the query input box lose focus, like clicking anywhere outside the box.
(2) We provide example datasets and query in the GUI when it is started. You can try with them or use your own datasets and queries.
See the Wiki of this repo.
DBSim is developed based on Splicer (trivio/splicer: Splicer - adds relation querying (SQL) to any python project (github.com)). We also borrowed some ideas from Apache Calcite.
If you use this codebase, or otherwise found our work valuable, please cite:
@misc{https://doi.org/10.48550/arxiv.2204.09819,
doi = {10.48550/ARXIV.2204.09819},
url = {https://arxiv.org/abs/2204.09819},
author = {Wang, Yifan and Wang, Daisy Zhe},
title = {Extensible Database Simulator for Fast Prototyping In-Database Algorithms},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}