Balsa is a learned query optimizer. It learns to optimize SQL queries by trial-and-error using deep reinforcement learning and sim-to-real learning.
Notably, Balsa is the first end-to-end learned optimizer that does not rely on learning from an existing expert optimizer's plans, while being able to surpass the performance of expert plans, sometimes by a sizable margin.
For technical details, see the SIGMOD 2022 paper, Balsa: Learning a Query Optimizer Without Expert Demonstrations [bibtex].
Setup | Quickstart | Experiment configs | Metrics and artifacts | Cluster mode | Q&A | Citation
To quickly get started, run the following on one machine which will run both the agent neural network and query execution.
-
Clone and install Balsa.
Details
git clone https://github.com/balsa-project/balsa.git ~/balsa cd ~/balsa # Recommended: run inside a Conda environment. # All commands that follow are run under this conda env. conda create -n balsa python=3.7 -y conda activate balsa pip install -r requirements.txt pip install -e . pip install -e pg_executor
-
Install Postgres v12.5.
Details
This can be done in several ways. For example, installing from source:cd ~/ wget https://ftp.postgresql.org/pub/source/v12.5/postgresql-12.5.tar.gz tar xzvf postgresql-12.5.tar.gz cd postgresql-12.5 ./configure --prefix=/data/postgresql-12.5 --without-readline sudo make -j sudo make install echo 'export PATH=/data/postgresql-12.5/bin:$PATH' >> ~/.bashrc source ~/.bashrc
-
Install the
pg_hint_plan
extension v1.3.7.Details
cd ~/ git clone https://github.com/ossc-db/pg_hint_plan.git -b REL12_1_3_7 cd pg_hint_plan # Modify Makefile: change line # PG_CONFIG = pg_config # to # PG_CONFIG = /data/postgresql-12.5/bin/pg_config vim Makefile make sudo make install
-
Load data into Postgres & start it with the correct configuration.
Details
For example, load the Join Order Benchmark (JOB) tables:cd ~/ mkdir -p datasets/job && pushd datasets/job wget -c http://homepages.cwi.nl/~boncz/job/imdb.tgz && tar -xvzf imdb.tgz && popd # Prepend headers to CSV files python3 ~/balsa/scripts/prepend_imdb_headers.py # Create and start the DB pg_ctl -D ~/imdb initdb # Copy custom PostgreSQL configuration. cp ~/balsa/conf/balsa-postgresql.conf ~/imdb/postgresql.conf # Start the server pg_ctl -D ~/imdb start -l logfile # Load data + run analyze (can take several minutes) cd ~/balsa bash load-postgres/load_job_postgres.sh ~/datasets/job
Perform basic checks:
psql imdbload # Check that both primary and foreign key indexes are built: imdbload=# \d title [...] # Check that data count is correct: imdbload=# select count(*) from title; count --------- 2528312 (1 row)
NOTE: Using one machine is only for quickly trying out Balsa. To cleanly reproduce results, use Cluster mode which automates the above setup on a cloud and separates the training machine from the query execution machines.
First, run the baseline PostgreSQL plans (the expert) on the Join Order Benchmark:
python run.py --run Baseline --local
This will prompt you to log into Weights & Biases to track experiments and visualize metrics easily, which we highly recommend. (You can disable it by prepending the env var WANDB_MODE=disabled
or offline
).
The first run may take a while due to warming up. After finishing, you can see messages like:
latency_expert/workload (seconds): 156.62 (113 queries)
...
wandb: latency_expert/workload 156.61727
Next, to launch a Balsa experiment:
python run.py --run Balsa_JOBRandSplit --local
The first time this is run, simulation data is collected for all training queures, which will finish in ~5 minutes (cached for future runs):
...
I0323 04:15:48.212239 140382943663744 sim.py:738] Collection done, stats:
I0323 04:15:48.212319 140382943663744 sim.py:742] num_queries=94 num_collected_queries=77 num_points=516379 latency_s=309.3
I0323 04:16:39.296590 140382943663744 sim.py:666] Saved simulation data (len 516379) to: data/sim-data-88bd801a.pkl
Balsa's simulation-to-reality approach requires first training an agent in simulation, then in real execution. To speed up the simulation phase, we have provided pretrained checkpoints for the simulation agent:
...
I0323 04:18:26.856739 140382943663744 sim.py:985] Loaded pretrained checkpoint: checkpoints/sim-MinCardCost-rand52split-680secs.ckpt
Then, the agent will start the first iteration of real-execution learning: planning all training queries, sending them off for execution, and waiting for these plans to finish. Periodically, test queries are planned and executed for logging.
Handy commands:
- To kill the experiment(s):
pkill -f run.py
- To monitor a machine:
dstat -mcdn
All experiments and their hyperparameters are declared in experiments.py
.
To run an experiment with the local Postgres execution engine:
# <name> is a config registered in experiments.py.
python run.py --run <name> --local
Main Balsa agent:
Benchmark | Config |
---|---|
JOB (Random Split) | Balsa_JOBRandSplit |
JOB Slow (Slow Split) | Balsa_JOBSlowSplit |
Ablation: impact of the simulator (Figure 10):
Variant | Config |
---|---|
Balsa Sim | (main agent) Balsa_JOBRandSplit |
Expert Sim | JOBRandSplit_PostgresSim |
No Sim | JOBRandSplit_NoSim |
NOTE: Running
JOBRandSplit_PostgresSim
for the first time will be slow (1.1 hours) due to simulation data being collected fromEXPLAIN
. This data is cached indata/
for future runs.
Ablation: impact of the timeout mechanism (Figure 11):
Variant | Config |
---|---|
Balsa (safe execution) | (main agent) Balsa_JOBRandSplit |
no timeout | JOBRandSplit_NoTimeout |
Ablation: impact of exploration schemes (Figure 12):
Variant | Config |
---|---|
Balsa (safe exploration) | (main agent) Balsa_JOBRandSplit |
epsilon-greedy | JOBRandSplit_EpsGreedy |
no exploration | JOBRandSplit_NoExplore |
Ablation: impact of training schemes (Figure 13):
Variant | Config |
---|---|
Balsa (on-policy) | (main agent) Balsa_JOBRandSplit |
retrain | JOBRandSplit_RetrainScheme |
Comparision with learning from expert demonstrations (Neo-impl) (Figure 15):
Variant | Config |
---|---|
Balsa | (main agent) Balsa_JOBRandSplit |
Neo-impl | NeoImpl_JOBRandSplit |
Diversified experiences (Figure 16):
Variant | Config |
---|---|
Balsa | (Main agents) JOB Balsa_JOBRandSplit ; JOB Slow Balsa_JOBSlowSplit |
Balsa-8x (uses 8 main agents' data) | JOB Balsa_JOBRandSplitReplay ; JOB Slow Balsa_JOBSlowSplitReplay |
Generalizing to highly distinct join templates, Ext-JOB (Figure 17):
Variant | Config |
---|---|
Balsa, data collection agent | Balsa_TrainJOB_TestExtJOB |
Balsa-1x (uses 1 base agent's data) | Balsa1x_TrainJOB_TestExtJOB |
Balsa-8x (uses 8 base agents' data) | Balsa8x_TrainJOB_TestExtJOB |
NOTE: When running a Ext-JOB config for the first time, you may see the error
Missing nodes in init_experience
. This meansdata/initial_policy_data.pkl
contains the expert latencies of all 113 JOB queries for printing (assuming you ran the previous configs first) but lacks the new Ext-JOB queries. To fix, rename the previous.pkl
file and rerun the new Ext-JOB config, which will automatically run the expert plans of the new query set to regenerate this file (as well as gathering the simulation data).
To specify a new experiment, subclass an existing config (give the subclass a descriptive name), change the values of some hyperparameters, and register the new subclass.
Each run's metrics and artifacts are logged to its log dir (path of the form ./wandb/run-20220323_051830-1dq64dx5/
), managed by W&B. We recommend creating an account and running wandb login
, so that these are automatically logged to their UI for visualization.
- Expert performance
latency_expert/workload
: total time of expert (PosgreSQL optimizer) plans on training queries, in secondslatency_expert_test/workload
: for test queries
- Agent performance
latency/workload
: total time of Balsa's plans on training queries, in secondslatency_test/workload
: for test queries
- Agent progress
curr_value_iter
: which iteration the agent is innum_query_execs
: total number of unique query plans executed, for training queriescurr_iter_skipped_queries
: for the current iteration, number of training plans executed before and thus cached- Wallclock duration is an x-axis choice in W&B:
Relative Time (Wall)
curr_timeout
: current iteration's timeout in seconds for any training plan; for Balsa's safe execution
- Learning efficiency: plot
latency/workload
vs. wallclock duration - Data efficiency: plot
latency/workload
vs.num_query_execs
Other (and less important) metrics are also available, such as latency(_expert)(_test)/q<query number>
for per-query latencies.
Example visualization grouping by config cls
:
Tracking metrics is sufficient for running and monitoring experiments. For advanced use cases, we provide a few artifacts (saved locally and automatically uploaded to W&B).
Agent checkpoints / the experience buffer so far are periodically (every 5 iters) saved to:
Saved iter=124 checkpoint to: <Balsa dir>/wandb/run-20220323_051830-1dq64dx5/files/checkpoint.pt
Saved Experience to: <Balsa dir>/data/replay-Balsa_JOBSlowSplit-9409execs-11844nodes-37s-124iters-1dq64dx5.pkl
The experience buffers are useful for training on diversified experiences (see paper for details).
Best training plans so far: under <logdir>/files/best_plans/
*.sql
: a hinted-version of each training query, where the hint is the best plan found so far; this can be piped intopsql
for re-executionall.sql
: concatenates all hinted training queries
latencies.txt
: best latency so far of training queries; the last row, "all", sums up all training queires
This means Balsa can be used to find the best possible plans for a set of high-value queries.
Hyperparameters of each run: <logdir>/files/params.txt
. Hparams are also (1) printed out at the beginning of each run; and (2) captured in W&B's Config table. Typically, referring to each hparam config by its class name, such as Balsa_JOBSlowSplit
, is sufficient, but if any change is made in run.py#Main()
, the above would capture it.
We recommend launching a multi-node Postgres cluster to support multiple agent runs in parallel. This is optional, but highly recommended: Agent runs can have some variance and thus it's valuable to measure the median and variance of agent performance across several runs of the same config.
Instructions below use Ray to launch a cluster on AWS; refer to the Ray documentation for credentials setup. See the same docs for launching on other clouds or an on-premise cluster.
NOTE: The AWS instance types used below may not have exactly the same performance as the Azure cluster reported in our paper. The overall magnitude/trends, however, should be the same. See Cluster hardware used in paper.
Launch the cluster from your local machine and Ray will handle installing all necessary dependencies and configuring both the GPU driver node (cluster head) and the Postgres execution nodes (cluster workers).
On your laptop, edit the cluster configuration file such that the number of nodes and file mounts are correctly configured; see all NOTE
comments. By default it uses a g4dn.12xlarge
head node (48 vCPUs, 192GB RAM, 4 T4 GPUs) and 20 r5a.2xlarge
worker nodes (8 vCPUs, 64GB RAM, 256GB SSD per node).
cd ~/balsa
vim balsa/cluster/cluster.yml
Start the cluster.
ray up balsa/cluster/cluster.yml
Log in to the head node.
ray attach balsa/cluster/cluster.yml
On the head node, check if the workers are ready. It can take 10-20 minutes while the workers are getting set up.
# Run this on the head node
conda activate balsa
cd ~/balsa/balsa/cluster/
python check_cluster.py
NOTE: To monitor the detailed status of worker launching and setup, run on your laptop:
ray exec cluster.yml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
.
Before warming up the Postgres worker nodes, you'd need to log onto AWS to add one rule in the security group.
1. Click any worker node's "Instance ID" on AWS EC2 web portal
2. Click "Security"
3. Click the security group "sg-xxxx (ray-autoscaler-<cluster name>)"
4. Click "Edit inbound rules"
5. Add rule -> "All TCP" and "Anywhere-IPv4"
6. Click "Save rules"
Warm up the Postgres worker nodes:
# Run this on the head node
cd ~/balsa/balsa/cluster
python warmup_pg_servers.py
Log in to the head node and run:
ray attach balsa/cluster/cluster.yml
wandb login
Before launching parallel runs, launch a single run first:
python run.py --run Baseline
python run.py --run <name>
This ensures that the appropriate simulation data is generated and cached and that data/initial_policy_data.pkl
is correct for the query split. After you've observed it to run successfully to the first iteration of the real-execution stage (Planning took ...
), safely kill it with pkill -f run.py
. Then, launch multiple runs:
# Run this on the head node
# Usage: bash scripts/launch.sh <name> <N>
bash scripts/launch.sh Balsa_JOBRandSplit 8
# To monitor:
# (1) W&B
# (2) tail -f Balsa_JOBRandSplit-1.log
You can edit Balsa code on your laptop under the working directory path specified under file_mounts
in the balsa/cluster/cluster.yml
file. Push updated code to the cluster with the following (this will interrupt ongoing runs):
ray up balsa/cluster/cluster.yml --restart-only
Now you can log in to the head node and launch new experiments with the updated changes.
For experiments reported in our paper, we ran 8 parallel agent runs on the head node sending queries to 20 Postgres nodes (i.e., 2.5 concurrent queires per agent run). We used Ubuntu 18.04 virtual machines on Azure:
- head node (training/inference): a
Standard_NV24_Promo
VM- 24 vCPUs, 224GB memory, 4 M60 GPUs; each agent run uses 0.5 GPU
- query execution nodes (the target hardware environment to optimize for): 20
Standard_E8as_v4
VMs- 8 vCPUs, 64GB memory, 128GB SSD (Premium SSD LRS) per VM
This is different from what is included in the cluster template above (AWS). To exactly reproduce, change the cluster template to launch the above or manually use the Azure portal to launch a VMSS.
If your cluster config differs in size or hardware, the numbers obtained may be different and the provided Postgres config may need to be adjusted. The overall trends and magnitudes should be similar.
How to train a Balsa agent with diversified experiences
Once you have 8 main agent runs finish, their experience buffers will be saved to, say, ./data/replay-Balsa_JOBRandSplit-*-499iters-*.pkl
.
Then, launch Balsa_JOBRandSplitReplay
to load all experience buffers that match replay-Balsa_JOBRandSplit-*
(specified by the glob pattern p.prev_replay_buffers_glob
in experiments.py
) to train a Balsa-8x agent.
python run.py --run Balsa_JOBRandSplitReplay
# Or:
bash scripts/launch.sh Balsa_JOBRandSplitReplay 8
If you have more buffers that match the glob pattern, be sure to change p.prev_replay_buffers_glob
to specify the desired buffers.
We also recommend using one experience buffer as a hold-out validation set by specifying p.prev_replay_buffers_glob_val
for more stable test performance.
Note that in this case the buffers used for training and validation need to be put in seperate directories and the glob patterns also need to be changed accordingly.
Other diversified experience configs are: Balsa_JOBSlowSplitReplay
, Balsa1x_TrainJOB_TestExtJOB
, and Balsa8x_TrainJOB_TestExtJOB
.
My experiment failed with a "Hint not respected" error
This occasionally happens when Postgres fails to respect the query plan hint generated by Balsa. This is a fundamental limitation with the pg_hint_plan
package. We suggest relaunching new runs to replace the failed runs.
@inproceedings{balsa,
author = {Yang, Zongheng and Chiang, Wei-Lin and Luan, Sifei and Mittal,
Gautam and Luo, Michael and Stoica, Ion},
title = {Balsa: Learning a Query Optimizer Without Expert
Demonstrations},
booktitle = {Proceedings of the 2022 International Conference on Management
of Data},
year = 2022,
pages = {931--944},
doi = {10.1145/3514221.3517885},
url = {https://doi.org/10.1145/3514221.3517885},
address = {New York, NY, USA},
isbn = 9781450392495,
keywords = {machine learning for systems, learned query optimization},
location = {Philadelphia, PA, USA},
numpages = 14,
publisher = {Association for Computing Machinery},
series = {SIGMOD/PODS '22},
}
@misc{balsa_github,
howpublished = {\url{https://github.com/balsa-project/balsa}},
title = "Balsa source code",
year = 2022,
}