Skip to content

Latest commit

 

History

History
156 lines (109 loc) · 4.14 KB

01-tpch.md

File metadata and controls

156 lines (109 loc) · 4.14 KB

Running TPC-H Queries

Currently, RisingLight supports importing TPC-H data and run a subset of TPC-H queries. You may import the TPC-H data and run some simple queries in RisingLight.

Generate TPC-H Data

Use Make Recipe

You may use the make recipe to download and generate TPC-H data (about 1GB in tbl format).

make tpch

If you want to run a larger query:

make tpch-10gb

The generated data will be placed under tpch-dbgen/tbl folder.

Manual Generation

First, you should use git to clone the tpch-dbgen repo:

git clone https://github.com/electrum/tpch-dbgen.git

This repo contains the program for generating TPC-H data.

Then, enter the tpch-dbgen directory and type make all, and it will generate some executable binaries such as dbgen and qgen. We will show you how to generate TPC-H data by using one line of command in the following sections. Meanwhile, you can read this README for more details.

Finally, type the following command and wait for several seconds:

./dbgen

This command will generate the data we want, which contains a table called LINEITEM with a size of 700MB.

Create Table and Import Data

You will need to build RisingLight in release mode, so as to import data faster.

cargo build --release

Then, use our test scripts to create tables.

cargo run --release -- -f tests/sql/tpch/create.sql

We can use \dt to ensure that all tables have been imported.

cargo run --release
# Inside SQL shell
\dt
+---+----------+---+----------+---+----------+
| 0 | postgres | 0 | postgres | 3 | supplier |
| 0 | postgres | 0 | postgres | 1 | region   |
| 0 | postgres | 0 | postgres | 0 | nation   |
| 0 | postgres | 0 | postgres | 4 | partsupp |
| 0 | postgres | 0 | postgres | 7 | lineitem |
| 0 | postgres | 0 | postgres | 6 | orders   |
| 0 | postgres | 0 | postgres | 2 | part     |
| 0 | postgres | 0 | postgres | 5 | customer |
+---+----------+---+----------+---+----------+

Then, we may use the import.sql to import data, which calls COPY FROM SQL statements internally:

cargo run --release -- -f tests/sql/tpch/import.sql

Generally, you can finish this process within several seconds.

Run TPC-H

Now, we can run simple queries on this table.

cargo run --release
select sum(L_LINENUMBER) from LINEITEM;
select count(L_ORDERKEY), sum(L_LINENUMBER) from LINEITEM where L_ORDERKEY > 2135527;

Or run real TPC-H queries:

cargo run --release -- -f tests/sql/tpch/q1.sql
cargo run --release -- -f tests/sql/tpch/q3.sql
cargo run --release -- -f tests/sql/tpch/q5.sql
cargo run --release -- -f tests/sql/tpch/q6.sql
cargo run --release -- -f tests/sql/tpch/q10.sql

Clean Data

All data of RisingLight is stored in risinglight.secondary.db folder. Simply remove it if you want to clean all data.

Developers: Add new TPC-H tests

In tests, we have two kinds of TPC-H tests:

  • Run TPC-H query on small dataset (used for unit tests). tpch folder.
  • Run TPC-H query on ~1GB dataset generated by tpch-gen. tpch-full folder.

Everytime we add a new TPC-H query, we should add both kinds of tests, in sqllogictest format. For example, developers should:

  • create tests/sql/tpch/_qXX.slt
  • create tests/sql/tpch-full/_qXX.slt
  • create tests/sql/tpch/qXX.sql
  • add include _qXX.slt in tests/sql/tpch/tpch.slt
  • add include _qXX.slt in tests/sql/tpch-full/tpch.slt

By using output-format parameter, we can easily get the output format required by sqllogictest:

cargo run --release -- -f tests/sql/tpch/q5.sql --output-format text
psql -d tpch -f tests/sql/tpch/q5.sql -A -t -F " "

... which yields

ALGERIA 55756674.2813
MOZAMBIQUE 54883960.1257
MOROCCO 50463646.0237
ETHIOPIA 49934541.2268
KENYA 48858086.8222

Developers should check the output against Postgres in order to ensure the output is correct. The full 1GB TPC-H test suite can be verified by running:

cargo run --release -- -f tests/sql/tpch-full/_tpch_full.slt --output-format text

And if you want to start fresh from empty database, you may run:

./tests/tpch-full.sh