Currently, RisingLight supports importing TPC-H data and run a subset of TPC-H queries. You may import the TPC-H data and run some simple queries in RisingLight.
You may use the make recipe to download and generate TPC-H data (about 1GB in tbl format).
make tpch
If you want to run a larger query:
make tpch-10gb
The generated data will be placed under tpch-dbgen/tbl
folder.
First, you should use git to clone the tpch-dbgen repo:
git clone https://github.com/electrum/tpch-dbgen.git
This repo contains the program for generating TPC-H data.
Then, enter the tpch-dbgen directory and type make all
, and it will generate some executable binaries such as dbgen
and qgen
. We will show you how to generate TPC-H data by using one line of command in the following sections. Meanwhile, you can read this README for more details.
Finally, type the following command and wait for several seconds:
./dbgen
This command will generate the data we want, which contains a table called LINEITEM
with a size of 700MB.
You will need to build RisingLight in release mode, so as to import data faster.
cargo build --release
Then, use our test scripts to create tables.
cargo run --release -- -f tests/sql/tpch/create.sql
We can use \dt
to ensure that all tables have been imported.
cargo run --release
# Inside SQL shell
\dt
+---+----------+---+----------+---+----------+
| 0 | postgres | 0 | postgres | 3 | supplier |
| 0 | postgres | 0 | postgres | 1 | region |
| 0 | postgres | 0 | postgres | 0 | nation |
| 0 | postgres | 0 | postgres | 4 | partsupp |
| 0 | postgres | 0 | postgres | 7 | lineitem |
| 0 | postgres | 0 | postgres | 6 | orders |
| 0 | postgres | 0 | postgres | 2 | part |
| 0 | postgres | 0 | postgres | 5 | customer |
+---+----------+---+----------+---+----------+
Then, we may use the import.sql
to import data, which calls COPY FROM
SQL statements internally:
cargo run --release -- -f tests/sql/tpch/import.sql
Generally, you can finish this process within several seconds.
Now, we can run simple queries on this table.
cargo run --release
select sum(L_LINENUMBER) from LINEITEM;
select count(L_ORDERKEY), sum(L_LINENUMBER) from LINEITEM where L_ORDERKEY > 2135527;
Or run real TPC-H queries:
cargo run --release -- -f tests/sql/tpch/q1.sql
cargo run --release -- -f tests/sql/tpch/q3.sql
cargo run --release -- -f tests/sql/tpch/q5.sql
cargo run --release -- -f tests/sql/tpch/q6.sql
cargo run --release -- -f tests/sql/tpch/q10.sql
All data of RisingLight is stored in risinglight.secondary.db
folder. Simply remove it
if you want to clean all data.
In tests
, we have two kinds of TPC-H tests:
- Run TPC-H query on small dataset (used for unit tests).
tpch
folder. - Run TPC-H query on ~1GB dataset generated by tpch-gen.
tpch-full
folder.
Everytime we add a new TPC-H query, we should add both kinds of tests, in sqllogictest format. For example, developers should:
- create
tests/sql/tpch/_qXX.slt
- create
tests/sql/tpch-full/_qXX.slt
- create
tests/sql/tpch/qXX.sql
- add
include _qXX.slt
intests/sql/tpch/tpch.slt
- add
include _qXX.slt
intests/sql/tpch-full/tpch.slt
By using output-format
parameter, we can easily get the output format required by sqllogictest:
cargo run --release -- -f tests/sql/tpch/q5.sql --output-format text
psql -d tpch -f tests/sql/tpch/q5.sql -A -t -F " "
... which yields
ALGERIA 55756674.2813
MOZAMBIQUE 54883960.1257
MOROCCO 50463646.0237
ETHIOPIA 49934541.2268
KENYA 48858086.8222
Developers should check the output against Postgres in order to ensure the output is correct. The full 1GB TPC-H test suite can be verified by running:
cargo run --release -- -f tests/sql/tpch-full/_tpch_full.slt --output-format text
And if you want to start fresh from empty database, you may run:
./tests/tpch-full.sh